How to Use LLMs Effectively

This is a narrative theme synthesising practical guidance for working with large language models — drawn primarily from How I Use LLMs and Deep Dive into LLMs like ChatGPT by Andrej Karpathy.

The central principle

Know what you’re talking to: a compressed, probabilistic snapshot of the internet, with a learned assistant persona, frozen at a training cutoff, running in a stateless session. Every practical habit flows from this model.

Habits by category

Model selection

Know which model you are using. Free tiers use smaller, weaker models. The difference in quality is material for professional use.
Match the model to the task. Try the fast, non-thinking model first. Escalate to a reasoning model (o3, Claude extended thinking, DeepSeek-R1) for hard maths, code, or multi-step reasoning. Non-thinking models are faster and usually sufficient for factual and creative tasks.
Use multiple models (the LLM Council). For important decisions — technical or personal — query multiple frontier models and compare. Disagreements are informative.

Context hygiene

Start a new chat when switching topics. Irrelevant tokens in the context window can degrade output and slow the model. Keep working memory clean.
Paste information in; don’t rely on recall. Parametric memory is vague and prone to hallucination. If you need accurate information, put it in the context window via file upload or copy-paste.

Tool use

Use web search for time-sensitive queries. Anything that could be answered by a Google search and skimming the top links: use the search tool.
Use a Python interpreter for arithmetic and analysis. Don’t trust the model’s mental arithmetic. A one-liner in Python is more reliable.
Use file upload for reading documents. Upload PDFs, papers, or web pages. The model reads alongside you as a knowledgeable collaborator.
Use deep research for multi-source research tasks. Expect a 10-minute run, treat the output as a first draft, and follow citations.

Prompting

Few-shot beats zero-shot. Don’t just describe the task — give examples of input and output. Always.
Let the model think. Don’t force an answer into the first token. Chain-of-thought prompting — “think step by step” — works because it distributes reasoning across tokens.
For recurring tasks, save few-shot prompts as Custom GPTs, Claude Projects, or equivalent presets.

Verification

Read generated code. The model is a capable but absent-minded analyst. It will silently substitute placeholder values, make arithmetic errors in narration, or miss a context-dependent constraint.
Follow citations from research outputs. Deep Research reports can hallucinate or misattribute. Verify what matters.
Ask the model to transcribe before trusting. When the model processes an image or document, ask it to transcribe what it extracted so you can verify accuracy.

Interface

Speak rather than type. Faster, and transcription (via Super Whisper or native mobile input) is now reliable for most purposes.
Use Custom Instructions for persistent preferences — tone, format, domain context — so you don’t re-explain them each session.

The verification discipline

Across all these habits, a consistent discipline: always verify what matters. The model is a powerful accelerator, not a source of truth. It will produce confidently wrong answers. The cost of that confidence is borne by the user, not the model.