How I Use LLMs

Author: Andrej Karpathy Date: February 2025 Video: https://www.youtube.com/watch?v=EWvNQjAaOHw Source file: raw/llm-use-script.md

A practical companion to Deep Dive into LLMs like ChatGPT. Walks through the LLM ecosystem, demonstrates real use cases with screen-share examples, and distils habits for working effectively with these tools.

What this talk covers

Practical demonstrations organised by capability:

The LLM ecosystem — ChatGPT, Claude, Gemini, Grok, DeepSeek; Chatbot Arena and SEAL leaderboard.
How interaction works under the hood — token stream, chat format, context window as working memory.
Pricing tiers — free vs Plus vs Pro; what you give up at each tier.
Thinking models — when to use reasoning models vs standard models.
Tool: internet search — time-sensitive queries, when to trigger search.
Tool: deep research — multi-search research reports; verifying citations.
Tool: file uploads — PDFs, books, papers in context.
Tool: Python interpreter — arithmetic, data analysis, plotting.
Claude Artifacts — inline rendered apps and Mermaid diagrams.
Cursor — professional coding with full-codebase context.
Audio input/output — speech transcription; Advanced Voice Mode.
NotebookLM — generating custom podcasts from uploaded sources.
Image input — OCR, medical results, supplement labels.
Image output — DALL-E, Ideogram.
Video input/output — point-and-talk on mobile; Sora, Veo 2.
Memory and Custom Instructions — persistent personalisation.
Custom GPTs — saved few-shot prompts.

Key concepts and tools

Context Window — wiping context (new chat) removes working memory; irrelevant tokens can degrade output.
Thinking Models — RL-trained models that reason before responding; trade speed for accuracy on hard problems.
Tool Use — web search, Python interpreter, image generation integrated via special tokens.
Hallucination — even with tools, models can hallucinate; verify citations and check generated code.
Vibe Coding — narrating requirements to an agent (Cursor/Composer) and letting it write code autonomously.

Key habits distilled

Model hygiene:

Know what model you are using (check the dropdown).
Pay for the tier that matches your use — differences are material.
Try the fast model first; escalate to a thinking model if the result seems wrong.

Context hygiene:

Start a new chat when switching topics.
For information the model must recall accurately, paste it into the context window — don’t rely on parametric memory.

Tool judgements:

Use internet search for anything time-sensitive or about rapidly changing information.
Use a Python interpreter for arithmetic, analysis, or counting — don’t trust mental arithmetic.
Use file upload for reading papers or books alongside the model.

Interaction style:

Few-shot prompts always outperform zero-shot; give examples of input/output format.
Custom GPTs are saved few-shot prompts — create them for recurring tasks.
Speak rather than type — faster, and transcription is now reliable.

Verification discipline:

Read the code generated by Advanced Data Analysis.
Follow citations from Deep Research reports.
Ask the model to transcribe image content before relying on what it extracted.

On the LLM Council

Karpathy treats all frontier models as an “LLM Council.” For significant decisions — technical or personal — he asks multiple models and compares. Each model has different strengths; the disagreements are informative.

Relation to other talks

Technical foundations covered in Deep Dive into LLMs like ChatGPT and Intro to Large Language Models.
Professional coding with agents expanded in From Vibe Coding to Agentic Engineering.