AI & LLM Fundamentals
What every professional needs to know about the technology reshaping work. Not the hype. The mechanics.
Most people use AI like a vending machine: put in a question, get an answer. They never look inside. That works fine until it doesn't. Until the answer is confidently wrong, or the output misses the point, or you hit a wall you can't diagnose.
“Understanding how the engine works doesn't make you a mechanic. It makes you a better driver.”
This page breaks down how large language models actually work. No jargon walls. No PhD required. Just the core mechanics that change how you prompt, evaluate, and collaborate with AI.
How LLMs Actually Work
Five steps from raw text to generated output. Scroll to see each stage come to life.
It starts with text
You type a sentence. The model doesn't read words like you do. It sees a stream of characters that it needs to break apart before it can do anything useful.
Breaking into tokens
The text is split into chunks: words, parts of words, or even individual characters. “Understanding” becomes [“Under”, “stand”, “ing”]. These chunks are called tokens, and they are the model's fundamental unit of meaning.
Mapping to meaning
Each token becomes a vector: a list of numbers that captures its meaning. Similar words end up close together in this space. “Understanding,” “knowing,” and “grasping” are neighbors.
Paying attention
The attention mechanism weighs every token against every other token. “Bank” means different things next to “river” vs. “account.” Attention is how the model resolves ambiguity.
Generating one token at a time
The model predicts the next most likely token, adds it to the sequence, and repeats. Every word you read in an AI response appeared because it had the highest probability given everything that came before it.
It starts with text
You type a sentence. The model doesn't read words like you do. It sees a stream of characters that it needs to break apart before it can do anything useful.
Breaking into tokens
The text is split into chunks: words, parts of words, or even individual characters. “Understanding” becomes [“Under”, “stand”, “ing”]. These chunks are called tokens, and they are the model's fundamental unit of meaning.
Mapping to meaning
Each token becomes a vector: a list of numbers that captures its meaning. Similar words end up close together in this space. “Understanding,” “knowing,” and “grasping” are neighbors.
Paying attention
The attention mechanism weighs every token against every other token. “Bank” means different things next to “river” vs. “account.” Attention is how the model resolves ambiguity.
Generating one token at a time
The model predicts the next most likely token, adds it to the sequence, and repeats. Every word you read in an AI response appeared because it had the highest probability given everything that came before it.
The Context Window
Every LLM has a limit: the context window. Think of it as a desk that can only hold so many papers at once.
It has a hard limit
GPT-4 can hold ~128k tokens. Claude can hold up to 200k. Beyond that, information is simply lost.
Position matters
Information at the start and end of the window gets more attention. The middle tends to get "lost."
Quality beats quantity
A focused 500-token prompt often outperforms a rambling 5,000-token one. Every token competes for attention.
Why This Changes How You Prompt
Understanding the mechanics transforms a vague request into a precision tool.
The Naive Prompt
Tell me about project management
Vague, no context, no constraints
The naive prompt
This is where most people start. A single sentence with no context, no constraints, no structure. The model will answer, but it's guessing what you actually want.
With context window awareness
Knowing that the model weighs all tokens equally, you front-load the essential context: your role, the situation, the specific output you need. Structure gives the attention mechanism something to latch onto.
With token awareness
Every filler word is a wasted token. You trim the prompt to its essence: precise language, clear format specification. The model spends its context budget on what matters.
With temperature awareness
The final layer: controlling the model's randomness. For a factual analysis, you want low temperature. For creative brainstorming, higher. Specifying this explicitly removes a source of unpredictable output.
The Naive Prompt
Tell me about project management
Vague, no context, no constraints
The naive prompt
This is where most people start. A single sentence with no context, no constraints, no structure. The model will answer, but it's guessing what you actually want.
With Context Window Awareness
Role: You are a senior PM consultant. Context: I lead a 12-person SaaS team transitioning from waterfall to agile. Task: Outline the top 5 risks in this transition and how to mitigate each.
Structured: role + context + scope
With context window awareness
Knowing that the model weighs all tokens equally, you front-load the essential context: your role, the situation, the specific output you need. Structure gives the attention mechanism something to latch onto.
With Token Awareness
Role: Senior PM consultant. Context: 12-person SaaS team, waterfall → agile. Task: Top 5 transition risks + mitigations. Format: Numbered list, one sentence each.
Concise: removed filler, precise language
With token awareness
Every filler word is a wasted token. You trim the prompt to its essence: precise language, clear format specification. The model spends its context budget on what matters.
With Temperature Awareness
Role: Senior PM consultant. Context: 12-person SaaS team, waterfall → agile. Task: Top 5 transition risks + mitigations. Format: Numbered list, one sentence each. Style: Factual, no speculation. Temperature: 0.3 for accuracy.
Complete: structure + efficiency + control
With temperature awareness
The final layer: controlling the model's randomness. For a factual analysis, you want low temperature. For creative brainstorming, higher. Specifying this explicitly removes a source of unpredictable output.
Temperature & Hallucinations
Two concepts that explain most of the surprises people encounter with AI.
Temperature
Controls randomness in the output. At 0, the model always picks the highest-probability token. At 1+, it takes creative risks. Match the setting to your task.
The Confidence Illusion
LLMs sound confident even when wrong. They don't "know" things; they predict the next likely token. A fluent answer is not a correct answer.
Hallucination
Not a bug, but a feature of probabilistic generation. The model fills gaps with plausible text. Critical thinking and verification are essential partners.
Practical Implications
Four principles that improve every interaction with AI, starting today.
Structure beats cleverness
A well-structured prompt with role, context, and constraints outperforms a clever one-liner every time. The model needs scaffolding, not poetry.
Context is currency
Every token in the context window competes for the model's attention. Use them wisely: trim filler, front-load the important information.
Verify, don't trust
LLMs are probabilistic, not factual. They will confidently generate plausible text that is completely wrong. Build verification into your workflow.
Temperature is a tool
Low temperature for facts and code. Higher for brainstorming and creative writing. Matching the setting to the task is a core skill.
Related Skill Deep-Dives
Each skill deep-dive on the Skills & Framework page explores a specific domain in depth, combining theory, practical frameworks, and real-world application.
Ready to Go Deeper?
I help teams build real AI fluency, from understanding the fundamentals to designing workflows that compound over time.