How Claude Works — Models, Context, and Prompts
How Claude Works
Models, context windows, tokens, prompts, and temperature — the essential mechanics you need to understand before you can use Claude like a pro.
How Large Language Models Work (Simplified)
Before diving into Claude’s specific features, it helps to understand the basic idea behind all large language models (LLMs). Do not worry — you do not need a PhD in machine learning. The core concept is surprisingly intuitive.
An LLM is a massive neural network that has been trained on enormous amounts of text — books, articles, websites, code repositories, academic papers, and more. During training, the model learns statistical patterns: given a sequence of words, what is the most likely next word? It does this billions of times across trillions of words, gradually building an incredibly rich internal representation of language, logic, facts, and reasoning patterns.
When you send Claude a message, the model does not “look up” an answer in a database. Instead, it generates a response one token at a time, where each token is predicted based on everything that came before it — your message, the conversation history, and the model’s learned patterns. Think of it as an extraordinarily sophisticated autocomplete that understands context, nuance, and logic at a level that often feels like genuine understanding.
Tokens and the Context Window
One of the most important concepts to understand is tokens. A token is the basic unit that language models work with. In English, one token is roughly three-quarters of a word. The word “understanding” might be split into two tokens: “under” and “standing”. Short common words like “the” or “is” are single tokens. Code and technical text often use more tokens per word because of special characters and syntax.
The context window is the total amount of text — measured in tokens — that the model can “see” at one time. Think of it as the model’s working memory. Everything in the context window — your current message, all previous messages in the conversation, any uploaded documents, and the system prompt — must fit within this limit.
Claude’s 200,000-token context window is one of the largest in the industry. In practical terms, you can paste an entire novel, a full codebase, or hundreds of pages of legal documents into a single conversation — and Claude can reason about all of it simultaneously. This is a game-changer for tasks like document analysis, code review, and research synthesis where having the full picture matters.
However, context windows have an important limitation: once a conversation exceeds the limit, the oldest messages are silently dropped. The model does not warn you. It simply loses access to the earliest parts of the conversation. This is why long conversations can sometimes feel like Claude has “forgotten” something you discussed earlier — it literally has, because that information no longer fits in the window.
Model Tiers in Practice: Opus vs. Sonnet vs. Haiku
You learned about the three Claude tiers in the previous lesson. Now let us look at when to use each one — because choosing the right model is one of the simplest ways to get better results.
| Scenario | Best Model | Why |
|---|---|---|
| Analyzing a 200-page contract | Opus | Complex reasoning over long context |
| Writing a marketing email | Sonnet | Good writing, fast turnaround |
| Classifying 10,000 support tickets | Haiku | High volume, low cost, simple task |
| Debugging a complex codebase | Opus | Needs deep multi-file reasoning |
| Brainstorming blog topics | Sonnet | Creative, fast, cost-effective |
| Quick translation of a sentence | Haiku | Simple task, speed matters |
On claude.ai, the default model is Sonnet, which is the right choice for 80% of everyday tasks. You can switch to Opus for demanding work using the model selector dropdown. Haiku is primarily available through the API, where developers build it into automated pipelines that process thousands of requests.
Understanding Prompts
A prompt is simply the text you send to Claude. It can be a question, an instruction, a document to analyze, or a conversation. The quality of your prompt is the single biggest factor in the quality of Claude’s response. This is so important that an entire discipline — prompt engineering — has emerged around the art and science of writing effective prompts.
At its core, the prompt is Claude’s only window into what you want. Claude cannot read your mind. It does not know your background, your industry, your preferences, or your goals unless you tell it. The more context and clarity you provide, the better the output. A vague prompt like “write me an email” will produce a generic result. A specific prompt like “write a professional follow-up email to a client who missed our product demo, tone should be warm but urgent, mention the recording link” will produce something you can actually send.
System Prompts — The Hidden Instruction Layer
Beyond what you type in the chat, there is another layer of instructions that shapes Claude’s behavior: the system prompt. This is a special message, usually set by the developer or platform, that Claude receives before your conversation begins. It tells Claude how to behave — what role to play, what tone to use, what constraints to follow, and what information to prioritize.
When you use claude.ai directly, Anthropic sets a default system prompt that instructs Claude to be helpful, harmless, and honest. But when businesses build Claude into their products via the API, they write custom system prompts tailored to their use case. For example, a customer service bot might have a system prompt like: “You are a support agent for Acme Corp. Only answer questions about our products. Never discuss competitors. Always be polite and offer to escalate to a human agent if the issue is complex.”
You can also use system prompts in your own workflows through the API or through tools like Claude Code. This is an incredibly powerful technique that we will explore in depth in later sections — it essentially lets you “program” Claude’s personality and behavior for specific tasks.
Temperature — Controlling Creativity vs. Precision
Temperature is a parameter that controls how “creative” or “random” Claude’s responses are. It is a number between 0 and 1. Understanding temperature helps you get the right type of output for different tasks.
Claude always picks the most likely next token. Responses are consistent and repeatable. Best for factual tasks: data extraction, classification, code generation where correctness is paramount.
A mix of reliability and variety. Good for general writing, analysis, and most everyday tasks. This is roughly what you experience on claude.ai by default.
Claude samples from a wider range of possible tokens. Responses are more varied, surprising, and creative. Best for brainstorming, storytelling, and generating diverse ideas.
On claude.ai, you do not have direct control over temperature — Anthropic sets a sensible default. But when using the API, you can set temperature precisely for each request. This is especially useful in production applications: you might use temperature 0 for extracting data from invoices (where you want exact, repeatable results) and temperature 0.8 for generating marketing copy (where variety and creativity are valued).
The Conversation as Input → Output
One final concept that ties everything together: every interaction with Claude is fundamentally an input-output operation. You provide an input (the full conversation context, including the system prompt, all previous messages, and your latest message), and Claude produces an output (the next response). There is no persistent memory between conversations. There is no hidden state. Every response is generated purely from what is in the context window at that moment.
This means that the quality of your output is entirely determined by the quality of your input. If you give Claude clear instructions, relevant context, and specific examples of what you want — the output will be excellent. If you give it a vague one-liner with no context — you get a generic response. This principle is the foundation of everything you will learn in this course.
- LLMs like Claude generate text by predicting the most likely next token based on learned patterns from training data
- Claude’s 200K token context window (~150K words) lets you work with entire books, codebases, or document sets at once
- Use Opus for complex reasoning, Sonnet for daily tasks, and Haiku for fast high-volume processing
- Your prompt is the single biggest factor in output quality — be specific, provide context, state your goal
- System prompts let developers and power users program Claude’s behavior for specific use cases
- Temperature controls the creativity-precision tradeoff: 0 for factual tasks, higher for creative work
- Every conversation is stateless — Claude only knows what is in the current context window
Praktické vysvětlení toho, jak Claude funguje pod kapotou — od základů LLM a tokenových kontextových oken po úrovně modelů, základy prompt engineeringu a nastavení teploty.
There are no comments for now.