Daniel Carral=> The Future of Work. NOW.
Skill Deep-Dive

AI & LLM Fundamentals

What every professional needs to know about the technology reshaping work. Not the hype. The mechanics.

Click to scroll

Most people use AI like a vending machine: put in a question, get an answer. They never look inside. That works fine until it doesn't. Until the answer is confidently wrong, or the output misses the point, or you hit a wall you can't diagnose.

“Understanding how the engine works doesn't make you a mechanic. It makes you a better driver.”

This page breaks down how large language models actually work. No jargon walls. No PhD required. Just the core mechanics that change how you prompt, evaluate, and collaborate with AI.

How LLMs Actually Work

Five steps from raw text to generated output. Scroll to see each stage come to life.

Input TextUnderstanding"Understanding"Understanding3 tokensWords split into subword unitsmeaningcontextUnderstandingknowlearngraspSimilar meaningscluster togetherUnderstandingtheThicker arcs = stronger attentionNext token prediction"the"42%"is"28%"of"15%"means"10%

It starts with text

You type a sentence. The model doesn't read words like you do. It sees a stream of characters that it needs to break apart before it can do anything useful.

TokenizationUnderstanding"Understanding"Understanding3 tokensWords split into subword unitsmeaningcontextUnderstandingknowlearngraspSimilar meaningscluster togetherUnderstandingtheThicker arcs = stronger attentionNext token prediction"the"42%"is"28%"of"15%"means"10%

Breaking into tokens

The text is split into chunks: words, parts of words, or even individual characters. “Understanding” becomes [“Under”, “stand”, “ing”]. These chunks are called tokens, and they are the model's fundamental unit of meaning.

EmbeddingsUnderstanding"Understanding"Understanding3 tokensWords split into subword unitsmeaningcontextUnderstandingknowlearngraspSimilar meaningscluster togetherUnderstandingtheThicker arcs = stronger attentionNext token prediction"the"42%"is"28%"of"15%"means"10%

Mapping to meaning

Each token becomes a vector: a list of numbers that captures its meaning. Similar words end up close together in this space. “Understanding,” “knowing,” and “grasping” are neighbors.

AttentionUnderstanding"Understanding"Understanding3 tokensWords split into subword unitsmeaningcontextUnderstandingknowlearngraspSimilar meaningscluster togetherUnderstandingtheThicker arcs = stronger attentionNext token prediction"the"42%"is"28%"of"15%"means"10%

Paying attention

The attention mechanism weighs every token against every other token. “Bank” means different things next to “river” vs. “account.” Attention is how the model resolves ambiguity.

OutputUnderstanding"Understanding"Understanding3 tokensWords split into subword unitsmeaningcontextUnderstandingknowlearngraspSimilar meaningscluster togetherUnderstandingtheThicker arcs = stronger attentionNext token prediction"the"42%"is"28%"of"15%"means"10%

Generating one token at a time

The model predicts the next most likely token, adds it to the sequence, and repeats. Every word you read in an AI response appeared because it had the highest probability given everything that came before it.

The Context Window

Every LLM has a limit: the context window. Think of it as a desk that can only hold so many papers at once.

Context Window0%

It has a hard limit

GPT-4 can hold ~128k tokens. Claude can hold up to 200k. Beyond that, information is simply lost.

Position matters

Information at the start and end of the window gets more attention. The middle tends to get "lost."

Quality beats quantity

A focused 500-token prompt often outperforms a rambling 5,000-token one. Every token competes for attention.

Why This Changes How You Prompt

Understanding the mechanics transforms a vague request into a precision tool.

The Naive Prompt

Tell me about project management

Vague, no context, no constraints

The naive prompt

This is where most people start. A single sentence with no context, no constraints, no structure. The model will answer, but it's guessing what you actually want.

With Context Window Awareness

Role: You are a senior PM consultant.
Context: I lead a 12-person SaaS team
  transitioning from waterfall to agile.
Task: Outline the top 5 risks in this
  transition and how to mitigate each.

Structured: role + context + scope

With context window awareness

Knowing that the model weighs all tokens equally, you front-load the essential context: your role, the situation, the specific output you need. Structure gives the attention mechanism something to latch onto.

With Token Awareness

Role: Senior PM consultant.
Context: 12-person SaaS team, waterfall → agile.
Task: Top 5 transition risks + mitigations.
Format: Numbered list, one sentence each.

Concise: removed filler, precise language

With token awareness

Every filler word is a wasted token. You trim the prompt to its essence: precise language, clear format specification. The model spends its context budget on what matters.

With Temperature Awareness

Role: Senior PM consultant.
Context: 12-person SaaS team, waterfall → agile.
Task: Top 5 transition risks + mitigations.
Format: Numbered list, one sentence each.
Style: Factual, no speculation.
Temperature: 0.3 for accuracy.

Complete: structure + efficiency + control

With temperature awareness

The final layer: controlling the model's randomness. For a factual analysis, you want low temperature. For creative brainstorming, higher. Specifying this explicitly removes a source of unpredictable output.

Temperature & Hallucinations

Two concepts that explain most of the surprises people encounter with AI.

Temperature

Controls randomness in the output. At 0, the model always picks the highest-probability token. At 1+, it takes creative risks. Match the setting to your task.

The Confidence Illusion

LLMs sound confident even when wrong. They don't "know" things; they predict the next likely token. A fluent answer is not a correct answer.

Hallucination

Not a bug, but a feature of probabilistic generation. The model fills gaps with plausible text. Critical thinking and verification are essential partners.

Practical Implications

Four principles that improve every interaction with AI, starting today.

Structure beats cleverness

A well-structured prompt with role, context, and constraints outperforms a clever one-liner every time. The model needs scaffolding, not poetry.

Context is currency

Every token in the context window competes for the model's attention. Use them wisely: trim filler, front-load the important information.

Verify, don't trust

LLMs are probabilistic, not factual. They will confidently generate plausible text that is completely wrong. Build verification into your workflow.

Temperature is a tool

Low temperature for facts and code. Higher for brainstorming and creative writing. Matching the setting to the task is a core skill.

Related Skill Deep-Dives

Each skill deep-dive on the Skills & Framework page explores a specific domain in depth, combining theory, practical frameworks, and real-world application.

Ready to Go Deeper?

I help teams build real AI fluency, from understanding the fundamentals to designing workflows that compound over time.