How I Use LLMs
Author: Andrej Karpathy
Date: February 2025
Video: https://www.youtube.com/watch?v=EWvNQjAaOHw
Source file: raw/llm-use-script.md
A practical companion to Deep Dive into LLMs like ChatGPT. Walks through the LLM ecosystem, demonstrates real use cases with screen-share examples, and distils habits for working effectively with these tools.
What this talk covers
Practical demonstrations organised by capability:
- The LLM ecosystem — ChatGPT, Claude, Gemini, Grok, DeepSeek; Chatbot Arena and SEAL leaderboard.
- How interaction works under the hood — token stream, chat format, context window as working memory.
- Pricing tiers — free vs Plus vs Pro; what you give up at each tier.
- Thinking models — when to use reasoning models vs standard models.
- Tool: internet search — time-sensitive queries, when to trigger search.
- Tool: deep research — multi-search research reports; verifying citations.
- Tool: file uploads — PDFs, books, papers in context.
- Tool: Python interpreter — arithmetic, data analysis, plotting.
- Claude Artifacts — inline rendered apps and Mermaid diagrams.
- Cursor — professional coding with full-codebase context.
- Audio input/output — speech transcription; Advanced Voice Mode.
- NotebookLM — generating custom podcasts from uploaded sources.
- Image input — OCR, medical results, supplement labels.
- Image output — DALL-E, Ideogram.
- Video input/output — point-and-talk on mobile; Sora, Veo 2.
- Memory and Custom Instructions — persistent personalisation.
- Custom GPTs — saved few-shot prompts.
Key concepts and tools
- Context Window — wiping context (new chat) removes working memory; irrelevant tokens can degrade output.
- Thinking Models — RL-trained models that reason before responding; trade speed for accuracy on hard problems.
- Tool Use — web search, Python interpreter, image generation integrated via special tokens.
- Hallucination — even with tools, models can hallucinate; verify citations and check generated code.
- Vibe Coding — narrating requirements to an agent (Cursor/Composer) and letting it write code autonomously.
Key habits distilled
Model hygiene:
- Know what model you are using (check the dropdown).
- Pay for the tier that matches your use — differences are material.
- Try the fast model first; escalate to a thinking model if the result seems wrong.
Context hygiene:
- Start a new chat when switching topics.
- For information the model must recall accurately, paste it into the context window — don’t rely on parametric memory.
Tool judgements:
- Use internet search for anything time-sensitive or about rapidly changing information.
- Use a Python interpreter for arithmetic, analysis, or counting — don’t trust mental arithmetic.
- Use file upload for reading papers or books alongside the model.
Interaction style:
- Few-shot prompts always outperform zero-shot; give examples of input/output format.
- Custom GPTs are saved few-shot prompts — create them for recurring tasks.
- Speak rather than type — faster, and transcription is now reliable.
Verification discipline:
- Read the code generated by Advanced Data Analysis.
- Follow citations from Deep Research reports.
- Ask the model to transcribe image content before relying on what it extracted.
On the LLM Council
Karpathy treats all frontier models as an “LLM Council.” For significant decisions — technical or personal — he asks multiple models and compares. Each model has different strengths; the disagreements are informative.
Relation to other talks
- Technical foundations covered in Deep Dive into LLMs like ChatGPT and Intro to Large Language Models.
- Professional coding with agents expanded in From Vibe Coding to Agentic Engineering.