AI Engineering

AI Engineering is the discipline of using pre-trained foundation models — accessed via API or open weights — to build production applications. Coined and systematised by Chip Huyen in her book AI Engineering (O’Reilly, 2025).

The defining distinction from ML Engineering:

	ML Engineering	AI Engineering
Core activity	Training and fine-tuning models	Using existing models to build products
Entry barrier	High (requires deep knowledge of training algorithms, hardware, distributed systems)	Moderate (requires application architecture, prompt engineering, evals)
Primary artefact	A trained model	A deployed application or agent
Tooling	PyTorch, CUDA, training infrastructure	APIs, RAG pipelines, agent frameworks, evals
Representative role	Research scientist at a frontier lab	Engineer building a Cursor competitor

Why the distinction matters

The availability of capable foundation models as a service changed the economics of AI application development fundamentally. Before APIs, building an AI product required training expertise. After APIs, an engineer who understands application architecture, data pipelines, and prompting can build production-grade AI products without deep ML training knowledge.

This does not make ML Engineering obsolete — foundation models still require training — but it created a new, much larger engineering population who need different skills to build with AI than to build AI.

The AI Engineering skill stack

From Chip Huyen on AI Engineering and Chip’s book:

Prompt engineering — writing prompts that reliably elicit correct outputs; few-shot design; chain-of-thought elicitation.
RAG design — retrieval architecture, chunking strategy, contextual enrichment, vector database configuration. See Tool Use.
Evals — designing and running evaluations that measure whether the product is working. See Evals.
Fine-tuning — when and how to adjust a base model on domain-specific data (less common for application engineers; more common at product maturity).
System thinking — understanding how components interact and where failures originate. Chip cites this as the most important and most durable skill, irreplaceable by AI tools.

System thinking as the non-automatable core

Chip’s argument: AI tools are good at executing well-defined, contained tasks. They struggle with problems that require understanding how multiple components interact — what she calls “holistic vs. local thinking.” Debugging a production system failure often requires tracing causes across layers that a local AI coding assistant cannot see.

Corollary from Stanford professor Mehran Sahami: CS education was never really about coding. It was about system thinking — understanding how to decompose a problem into a solution. Coding is the means; decomposition is the skill. AI can automate many of the means; the skill remains.

What actually improves AI apps

Chip’s viral framing of what most teams get wrong:

Overvalued	Undervalued
Keeping up with latest AI news	Talking to users
Adopting new agentic frameworks	Building reliable platforms
Agonising over vector database choice	Preparing better data
Constantly comparing models	Optimising end-to-end workflows
Fine-tuning	Writing better prompts

The practical implication: for most AI products, the performance ceiling is not model capability — it is data quality, user understanding, and prompt design. The highest-leverage investment is the cheapest: user research and better prompts.

Where mainstream views differ