AI Engineering
AI Engineering is the discipline of using pre-trained foundation models — accessed via API or open weights — to build production applications. Coined and systematised by Chip Huyen in her book AI Engineering (O’Reilly, 2025).
The defining distinction from ML Engineering:
| ML Engineering | AI Engineering | |
|---|---|---|
| Core activity | Training and fine-tuning models | Using existing models to build products |
| Entry barrier | High (requires deep knowledge of training algorithms, hardware, distributed systems) | Moderate (requires application architecture, prompt engineering, evals) |
| Primary artefact | A trained model | A deployed application or agent |
| Tooling | PyTorch, CUDA, training infrastructure | APIs, RAG pipelines, agent frameworks, evals |
| Representative role | Research scientist at a frontier lab | Engineer building a Cursor competitor |
Why the distinction matters
The availability of capable foundation models as a service changed the economics of AI application development fundamentally. Before APIs, building an AI product required training expertise. After APIs, an engineer who understands application architecture, data pipelines, and prompting can build production-grade AI products without deep ML training knowledge.
This does not make ML Engineering obsolete — foundation models still require training — but it created a new, much larger engineering population who need different skills to build with AI than to build AI.
The AI Engineering skill stack
From Chip Huyen on AI Engineering and Chip’s book:
- Prompt engineering — writing prompts that reliably elicit correct outputs; few-shot design; chain-of-thought elicitation.
- RAG design — retrieval architecture, chunking strategy, contextual enrichment, vector database configuration. See Tool Use.
- Evals — designing and running evaluations that measure whether the product is working. See Evals.
- Fine-tuning — when and how to adjust a base model on domain-specific data (less common for application engineers; more common at product maturity).
- System thinking — understanding how components interact and where failures originate. Chip cites this as the most important and most durable skill, irreplaceable by AI tools.
System thinking as the non-automatable core
Chip’s argument: AI tools are good at executing well-defined, contained tasks. They struggle with problems that require understanding how multiple components interact — what she calls “holistic vs. local thinking.” Debugging a production system failure often requires tracing causes across layers that a local AI coding assistant cannot see.
Corollary from Stanford professor Mehran Sahami: CS education was never really about coding. It was about system thinking — understanding how to decompose a problem into a solution. Coding is the means; decomposition is the skill. AI can automate many of the means; the skill remains.
What actually improves AI apps
Chip’s viral framing of what most teams get wrong:
| Overvalued | Undervalued |
|---|---|
| Keeping up with latest AI news | Talking to users |
| Adopting new agentic frameworks | Building reliable platforms |
| Agonising over vector database choice | Preparing better data |
| Constantly comparing models | Optimising end-to-end workflows |
| Fine-tuning | Writing better prompts |
The practical implication: for most AI products, the performance ceiling is not model capability — it is data quality, user understanding, and prompt design. The highest-leverage investment is the cheapest: user research and better prompts.
Where mainstream views differ
Some practitioners argue the ML/AI Engineering distinction overstates the importance of ML fundamentals for application engineers, and understates how often application engineers need to understand training behaviour to debug unexpected model outputs. Chip’s response: the book is descriptive, not prescriptive — the profession exists regardless of whether the boundary is clean.