Concept

Agentic Engineering

concept agentic-engineering software-development professional multi-source

Agentic Engineering

Agentic engineering is Andrej Karpathy‘s term for the discipline of coordinating AI agents to build and maintain professional-quality software — preserving correctness guarantees while operating at the higher speed and scale that agents enable.


Contrast with vibe coding

Vibe CodingAgentic Engineering
WhoAnyoneTrained engineers
GoalRaise the floorRaise the ceiling
Quality barAcceptable for explorationProfessional-grade
Error ownershipErrors are toleratedErrors are yours
SpeedFastMuch faster than pre-agent engineering

Agentic engineering is not a rejection of agent tools — it is their disciplined use. The agents are the same; the accountability and oversight are different.


The 10× ceiling is obsolete

The pre-agent “10× engineer” framing is too conservative. People who are fully AI-native and skilled at agentic engineering operate at multiples that dwarf 10×. The ceiling has moved dramatically.

The difference between mediocre and excellent agentic engineering comes down to investment: investing in your setup, learning tools deeply, utilising all their features. The same discipline that previously applied to mastering vim or VS Code now applies to Claude Code, Cursor, and their successors.


Security and correctness

Vulnerabilities introduced by careless agent use are still your vulnerabilities. Agents are:

  • Powerful — can generate and modify large quantities of code quickly.
  • Fallible — stochastic, capable of surprising mistakes.
  • Unaware of broader context — a single agent call does not understand your security model, your test coverage, or your deployment environment.

Agentic engineering requires the same verification discipline as traditional engineering — perhaps more, because the speed of generation makes it easy to accumulate unreviewed changes.


A new hiring test

Karpathy’s proposed replacement for puzzle-solving interviews:

  1. Give the candidate a large project — build a Twitter clone, make it secure, deploy it.
  2. Attack their implementation with adversarial agents.
  3. Can it hold?

The ability to build software that withstands agent-assisted attack is the right test for agentic engineering capability.


Boris Cherny: agentic engineering in practice

Boris Cherny on Claude Code provides the most concrete operational account of agentic engineering at scale. Key additions from that source:

  • Under-funding forces Claude-ification. One engineer on a project, intrinsically motivated to ship, will Claude-ify their workflow without being told to.
  • Coding is virtually solved for the kinds of work Boris does. The next frontier is not better coding but idea generation, project management, and non-engineering tasks (Cowork).
  • Everyone codes. On the Claude Code team: PM, EM, designer, finance, data scientist — all code. Role boundaries between engineer, designer, and PM have ~50% overlap and are expected to collapse further.
  • 200% productivity per engineer (measured in PRs) at Anthropic in the year since Claude Code launched.
  • Build for the model 6 months from now, not today’s. Accept poor PMF early; design for the capability that is coming. Bitter Lesson.

Amjad Masad: agentic engineering for non-engineers

Amjad Masad on Replit extends the picture in an important direction: agentic engineering is not only for trained engineers. Replit’s agent enables non-engineers (PMs, founders, designers) to build and deploy production-ready MVPs.

Key additions:

  • Democratisation. The binding constraint has shifted from production to idea generation. When anyone can build, the scarce skill is deciding what to build.
  • ACI. Agentic platforms require purpose-designed interfaces for LLM consumption (ACI) — not human interfaces repurposed for agents.
  • Amjad’s Law. The ROI of minimal coding literacy (prompting, reading, debugging) doubles every six months as AI amplifies it. See Amjad's Law.
  • Zero-employee companies. The logical endpoint: billion-dollar businesses run by one human with AI handling development, support, and operations.

Bret Taylor: agent is the new app

Bret Taylor on Sierra provides the enterprise commercialisation view of agentic engineering. Key additions:

  • Agent is the new app. Customer-facing AI agents will become the primary interface for every company — not a feature added to existing products, but the product. This is structural, not incremental.
  • Three-tier AI market. Frontier models (hyperscaler only) → tooling (Developer Day risk) → applied/agents (large opportunity). The applied layer is where agentic engineering creates durable competitive value because the moat is workflow knowledge and organisational context, not model quality.
  • Context engineering over model-waiting. Most AI output failures in production codebases are context failures, not capability failures. Dedicate engineering capacity to identifying missing context and supplying it via MCP — this compounds over time whereas scaffolding decays (Bitter Lesson).
  • Outcomes-based pricing as the commercial expression of agentic value: charging per achieved outcome (resolved interaction, completed transaction) rather than per token. See Outcomes-Based Pricing.

Claire Vo: agent as employee and multi-agent specialisation

Claire Vo on OpenClaw provides the most operational account of personal agentic engineering in the wiki. Key additions:

  • Agent as employee. The most useful deployment frame: every decision about agent setup (account provisioning, access scope, trust escalation) should mirror how you would onboard a human assistant. OpenClaw failures are context failures, not model failures — the same as an employee not knowing what they’re supposed to do.
  • Multi-agent specialisation. Context overload is the fundamental constraint. A single agent cannot hold the context for every domain without degrading performance. Separate agents by role (work, family, sales, podcast) using the same logic as Slack channels — not one general-purpose agent for everything.
  • Soul/heartbeat architecture. Persistent agents feel alive because of three elements: a persistent identity file (soul), a scheduled heartbeat (cron-based task checking), and cumulative memory. The “aliveness” is scheduling + identity, not magic.
  • The web is hostile to agents. Browser automation is unreliable because websites are architecturally hardened against bots. API-first (then MCP) is the correct interface strategy; browser use is a workaround that will remain unreliable until agent-native web interfaces emerge. See ACI.
  • Management skills transfer. Role scoping, context documentation, progressive trust, and outcome management — not technical skills — determine whether an agent team succeeds.

Dan Shipper: allocation economy and compounding engineering

Dan Shipper on Every contributes two frameworks that extend agentic engineering into the organisational and economic dimensions.

  • Allocation economy. As AI agents do knowledge work, humans shift from doing to allocating — scoping tasks, evaluating outputs, giving feedback, holding vision. These are management skills, currently rare (≈8% of workers). Dan’s thesis: they will become universal because managing agents is cheap. This reframes agentic engineering: the high-value meta-skill is not writing agent code but directing agent work well.
  • Compounding engineering. Each unit of work should make the next unit easier. Mechanisms: encoding feedback as reusable prompts, building automations rather than repeating manual processes, investing in knowledge bases. At Every, every feedback session becomes a prompt; the accumulation compounds. See Compounding Engineering.
  • Head of AI operations. An emerging organisational role (Katie Parrott at Every): translates the principal’s taste and judgment into machine-readable prompts and automations. Not a developer; a process and communication expert. Predicted to become standard as agentic teams scale.
  • AGI as profitable infinite agents. AGI is not a capability threshold but an economic one — the point at which it is profitable to run agents indefinitely without human oversight. Uses the Winnicott leash metaphor: the leash disappears when the economics of supervision change, not when capability crosses a bar.
  • CEO as adoption predictor. The strongest predictor of org-wide AI adoption is whether the CEO uses AI tools daily.

Michael Truell: chop things up and the logic designer

Michael Truell on Cursor and the World After Code provides the most product-grounded view of agentic engineering from the perspective of a toolmaker building for professional engineers.

The “chop things up” pattern. Observation from Cursor’s user data: the most successful power users break work into small increments — specify a little, review, specify a little more — rather than issuing a large specification and waiting for a complete output. Two anti-patterns:

  • Junior engineers go too wholesale: accept too much AI output without adequate review; reach a wall when the codebase grows beyond their ability to understand.
  • Senior engineers under-use: stick to existing workflows; underestimate what AI can do for them.

The right operating pattern today is tight iteration. This operationalises the agentic engineering principle of maintaining human oversight — not as a security/correctness concern (Karpathy’s emphasis) but as a product-quality concern: tight loops produce better outcomes than long autonomous runs.

The custom model ensemble as agentic architecture. Cursor’s internal model stack illustrates how agentic engineering at scale requires building the AI infrastructure, not just using it:

  • Autocomplete (custom, fast, diff-specific) → small tight loop with the engineer on every keystroke.
  • Retrieval (custom input-side model) → supplies the right context to the large model.
  • Sketch-to-diff (custom output-side model) → translates large-model reasoning into executable changes.

This ensemble architecture shows that agentic engineering is not “use the best model for everything” but “use the right model for each sub-task and design the handoffs between them.” The human engineer’s role in this system is to review the outputs of the autocomplete and agent loops — i.e., to be the verification layer between AI sub-systems.

World after code as agentic engineering’s endpoint. Michael’s “world after code” thesis (engineer = logic designer) is the long-run endpoint of agentic engineering: when the implementation translation is reliable enough that specifying intent in pseudocode-like form is sufficient, the agentic workflow converges on intent specification + review, with all implementation handled by the AI stack. The transition from today’s tight loops to that end state is a gradual product evolution, not a sudden switch.


Simon Willison: popularising the term and the dark factory frontier

Simon Willison popularised “agentic engineering” as the term for professional software development using coding agents — distinct from Vibe Coding (not looking at code, not responsible for it). Willison’s definition emphasises that agentic engineering requires the full depth of professional engineering experience: “Using coding agents well is taking every inch of my 25 years of experience.” Running four agents in parallel is cognitively exhausting; cognitive load management is itself a skill.

The dark factory pattern is Willison’s term for the next frontier beyond agentic engineering: production-grade software produced by agents that the engineer does not directly review at all. Named after “lights-out factories” (so automated you can turn the lights off). The leading verification mechanism is automated tests — if agents write comprehensive tests and all pass, the engineer can have confidence without line-by-line review. This is currently experimental; companies like StrongDM are actively exploring it. [§ Simon Willison on Agentic Engineering and the Future of Code]

Test-driven development as the agentic discipline. Willison argues tests are essential for safe agentic engineering: they confirm code ran, catch regressions, and enable parallelisation. Red/Green TDD (write test first → watch fail → implement → watch pass) is a standard practice that translates well to agents because it can be encoded as a compact prompt shortcut.


Sander Schulhoff: prompt injection as the agentic security constraint

Prompt Injection is the dominant unresolved security problem for agentic systems. With chatbot-only AI, prompt injection produces harmful text — limited blast radius. With agents that can write to databases, send email, execute code, or control embodied hardware, the same structural vulnerability produces harmful actions.

The agentic amplification problem. Every capability granted to an agent is an attack surface. If an agent can book flights, a successful injection can book flights for an attacker. If it can write to a database, an injection can corrupt records. The attack mechanism is identical to chatbot injection; the consequences are orders of magnitude larger. [§ Sander Schulhoff on Prompt Engineering and Red Teaming]

Indirect injection in agentic pipelines. Agents that read external content (webpages, documents, API responses) are vulnerable to adversarial instructions embedded in that content — with no malicious user action required at the time of attack. A malicious webpage can redirect a coding agent to inject a virus into the codebase it is working on. [§ Sander Schulhoff on AI Security and Guardrails]

Architectural implication. The correct response is not guardrails but least-privilege design: enumerate every capability the agent has; treat each as an attack surface; default to read-only; require human review at irreversible action boundaries. Agentic engineering discipline must incorporate security threat modelling alongside capability design — not as an afterthought.


See also