Concept

AI Engineering

concept ai-engineering ml-engineering software-development professional

AI Engineering

AI Engineering is the discipline of using pre-trained foundation models — accessed via API or open weights — to build production applications. Coined and systematised by Chip Huyen in her book AI Engineering (O’Reilly, 2025).

The defining distinction from ML Engineering:

ML EngineeringAI Engineering
Core activityTraining and fine-tuning modelsUsing existing models to build products
Entry barrierHigh (requires deep knowledge of training algorithms, hardware, distributed systems)Moderate (requires application architecture, prompt engineering, evals)
Primary artefactA trained modelA deployed application or agent
ToolingPyTorch, CUDA, training infrastructureAPIs, RAG pipelines, agent frameworks, evals
Representative roleResearch scientist at a frontier labEngineer building a Cursor competitor

Why the distinction matters

The availability of capable foundation models as a service changed the economics of AI application development fundamentally. Before APIs, building an AI product required training expertise. After APIs, an engineer who understands application architecture, data pipelines, and prompting can build production-grade AI products without deep ML training knowledge.

This does not make ML Engineering obsolete — foundation models still require training — but it created a new, much larger engineering population who need different skills to build with AI than to build AI.


The AI Engineering skill stack

From Chip Huyen on AI Engineering and Chip’s book:

  1. Prompt engineering — writing prompts that reliably elicit correct outputs; few-shot design; chain-of-thought elicitation.
  2. RAG design — retrieval architecture, chunking strategy, contextual enrichment, vector database configuration. See Tool Use.
  3. Evals — designing and running evaluations that measure whether the product is working. See Evals.
  4. Fine-tuning — when and how to adjust a base model on domain-specific data (less common for application engineers; more common at product maturity).
  5. System thinking — understanding how components interact and where failures originate. Chip cites this as the most important and most durable skill, irreplaceable by AI tools.

System thinking as the non-automatable core

Chip’s argument: AI tools are good at executing well-defined, contained tasks. They struggle with problems that require understanding how multiple components interact — what she calls “holistic vs. local thinking.” Debugging a production system failure often requires tracing causes across layers that a local AI coding assistant cannot see.

Corollary from Stanford professor Mehran Sahami: CS education was never really about coding. It was about system thinking — understanding how to decompose a problem into a solution. Coding is the means; decomposition is the skill. AI can automate many of the means; the skill remains.


What actually improves AI apps

Chip’s viral framing of what most teams get wrong:

OvervaluedUndervalued
Keeping up with latest AI newsTalking to users
Adopting new agentic frameworksBuilding reliable platforms
Agonising over vector database choicePreparing better data
Constantly comparing modelsOptimising end-to-end workflows
Fine-tuningWriting better prompts

The practical implication: for most AI products, the performance ceiling is not model capability — it is data quality, user understanding, and prompt design. The highest-leverage investment is the cheapest: user research and better prompts.


Where mainstream views differ

Some practitioners argue the ML/AI Engineering distinction overstates the importance of ML fundamentals for application engineers, and understates how often application engineers need to understand training behaviour to debug unexpected model outputs. Chip’s response: the book is descriptive, not prescriptive — the profession exists regardless of whether the boundary is clean.


See also