Tool Use

Tool use is the capability of a large language model to invoke external systems — web search, code interpreters, file readers, image generators — by emitting special tokens that the surrounding application intercepts and acts upon.

The mechanism

The model is trained to emit special token sequences when it wants to use a tool. For example:

<SEARCH_START>when is White Lotus Season 3 Episode 2<SEARCH_END>

The inference application detects this sequence, pauses token generation, executes the search, and injects the results into the Context Window. Generation resumes with the retrieved information directly available.

The same pattern applies to:

Python interpreter — the model writes code; the application executes it and returns the output.
Calculator — arithmetic delegated to a reliable tool.
Image generation — the model writes a description; a separate image model generates the image.
File access / RAG — relevant passages retrieved from a document store and inserted into context.

Why tool use matters

Without tools, the model is a closed system: it can only draw on what it learnt during training, which is frozen at a training cutoff and subject to Hallucination. With tools:

Time-sensitivity is resolved: web search retrieves current information.
Arithmetic and counting are reliable: the Python interpreter computes exactly.
Character-level tasks (spelling, string formatting) are reliable: code operates on actual characters, not tokens.
Long documents are accessible: file uploads put content directly in context.

Karpathy’s framing: just as humans don’t do complex arithmetic in their heads — they reach for a calculator — well-designed LLM systems reach for tools rather than reasoning beyond their reliable capability.

Practical guidance

Task type	Recommended tool
Recent events / current data	Web search
Arithmetic, statistics, plotting	Python interpreter
Reading papers or books	File upload
Counting, character operations	Code
Long-horizon research	Deep research mode

Agentic tool use

In agentic settings, the model issues multiple tool calls in sequence, building up a plan from the results. See Agentic Engineering and Vibe Coding for how this extends to full software development workflows.

Tool Use

The mechanism

Why tool use matters

Practical guidance

Agentic tool use

See also