How LLMs Work
This is a narrative theme tracing the technical pipeline through which a large language model is built and operated — from raw internet data to deployed assistant. Synthesised from Deep Dive into LLMs like ChatGPT and Intro to Large Language Models by Andrej Karpathy.
The core claim
An LLM is not magic. It is the output of a three-stage training pipeline applied to large quantities of text. Understanding each stage explains both the model’s remarkable capabilities and its systematic limitations.
Stage 1: Pretraining — knowledge acquisition
Tens of terabytes of internet text are filtered, tokenised, and used to train a Transformer neural network to predict the next token. This is lossy compression: the internet is compressed roughly 100× into the model’s parameters. The result is a base model — an internet document simulator with no assistant behaviour.
What the base model knows: statistical patterns over internet text. Things mentioned often → recalled reliably. Things mentioned rarely → recalled vaguely or not at all. Knowledge of events after the training cutoff → none (without tools).
Scaling Laws make this predictable: add parameters and data, and loss improves smoothly. No sign of saturation at current scales.
Stage 2: Fine-tuning — alignment
The dataset switches from internet text to ~100,000 human-labelled Q&A conversations. The same training algorithm runs cheaply (hours). The model learns to imitate the assistant persona specified by the labelling instructions.
Pre-training is about knowledge; fine-tuning is about alignment. The model’s knowledge is not updated at this stage — only its output format and persona.
When you talk to ChatGPT, you are talking to a statistical simulation of a human labeller following OpenAI’s guidelines.
Stage 3: Reinforcement learning — emergent reasoning
In verifiable domains (maths, code), the model generates candidate solutions, scores them against correct answers, and trains on the successful ones. No human writes the “correct” reasoning path; the model discovers it.
This produces emergent chains of thought — the model learns to say “wait, let me reconsider” because doing so improves accuracy, not because a human wrote that. The AlphaGo analogy: RL discovers Move 37; human experts hadn’t found it.
In unverifiable domains, RLHF substitutes human preference rankings as a proxy reward. This polishes the model but cannot run indefinitely — reward hacking caps the improvement.
The cognitive profile that results
- Hallucination: confident confabulation when knowledge is absent. Mitigation: refusal training + tools.
- Jagged Intelligence: arbitrary capability gaps reflecting what has and hasn’t been developed or included in RL training.
- Tokens to think: reasoning must be distributed across tokens; chain-of-thought prompting allocates compute time.
- Context Window: working memory only — precise and current, unlike parametric memory.
- Stateless: no persistence across sessions without explicit memory features.
The key insight for users
The model is not a database, a search engine, or an oracle. It is a statistical engine trained on human text that has learnt to imitate helpful human responses. Its knowledge is probabilistic, its recall is vague, and it has no awareness of its own limits without training that specifically surfaces those limits.
Working effectively with LLMs means exploiting their strengths (broad knowledge, fluent generation, tool integration) while compensating for their weaknesses (hallucination, arithmetic errors, character-level blindness).