How I Use LLMs

How I Use LLMs

karpathy llm practical tools workflow ecosystem
Read transcript →

How I Use LLMs

Author: Andrej Karpathy Date: February 2025 Video: https://www.youtube.com/watch?v=EWvNQjAaOHw Source file: raw/llm-use-script.md

A practical companion to Deep Dive into LLMs like ChatGPT. Walks through the LLM ecosystem, demonstrates real use cases with screen-share examples, and distils habits for working effectively with these tools.


What this talk covers

Practical demonstrations organised by capability:

  1. The LLM ecosystem — ChatGPT, Claude, Gemini, Grok, DeepSeek; Chatbot Arena and SEAL leaderboard.
  2. How interaction works under the hood — token stream, chat format, context window as working memory.
  3. Pricing tiers — free vs Plus vs Pro; what you give up at each tier.
  4. Thinking models — when to use reasoning models vs standard models.
  5. Tool: internet search — time-sensitive queries, when to trigger search.
  6. Tool: deep research — multi-search research reports; verifying citations.
  7. Tool: file uploads — PDFs, books, papers in context.
  8. Tool: Python interpreter — arithmetic, data analysis, plotting.
  9. Claude Artifacts — inline rendered apps and Mermaid diagrams.
  10. Cursor — professional coding with full-codebase context.
  11. Audio input/output — speech transcription; Advanced Voice Mode.
  12. NotebookLM — generating custom podcasts from uploaded sources.
  13. Image input — OCR, medical results, supplement labels.
  14. Image output — DALL-E, Ideogram.
  15. Video input/output — point-and-talk on mobile; Sora, Veo 2.
  16. Memory and Custom Instructions — persistent personalisation.
  17. Custom GPTs — saved few-shot prompts.

Key concepts and tools

  • Context Window — wiping context (new chat) removes working memory; irrelevant tokens can degrade output.
  • Thinking Models — RL-trained models that reason before responding; trade speed for accuracy on hard problems.
  • Tool Use — web search, Python interpreter, image generation integrated via special tokens.
  • Hallucination — even with tools, models can hallucinate; verify citations and check generated code.
  • Vibe Coding — narrating requirements to an agent (Cursor/Composer) and letting it write code autonomously.

Key habits distilled

Model hygiene:

  • Know what model you are using (check the dropdown).
  • Pay for the tier that matches your use — differences are material.
  • Try the fast model first; escalate to a thinking model if the result seems wrong.

Context hygiene:

  • Start a new chat when switching topics.
  • For information the model must recall accurately, paste it into the context window — don’t rely on parametric memory.

Tool judgements:

  • Use internet search for anything time-sensitive or about rapidly changing information.
  • Use a Python interpreter for arithmetic, analysis, or counting — don’t trust mental arithmetic.
  • Use file upload for reading papers or books alongside the model.

Interaction style:

  • Few-shot prompts always outperform zero-shot; give examples of input/output format.
  • Custom GPTs are saved few-shot prompts — create them for recurring tasks.
  • Speak rather than type — faster, and transcription is now reliable.

Verification discipline:

  • Read the code generated by Advanced Data Analysis.
  • Follow citations from Deep Research reports.
  • Ask the model to transcribe image content before relying on what it extracted.

On the LLM Council

Karpathy treats all frontier models as an “LLM Council.” For significant decisions — technical or personal — he asks multiple models and compares. Each model has different strengths; the disagreements are informative.


Relation to other talks