The Future of Software Engineering

This is a narrative theme tracing how AI agents are transforming software development — from experimental acceleration to a fundamentally new programming paradigm. Synthesised from From Vibe Coding to Agentic Engineering, How I Use LLMs, Deep Dive into LLMs like ChatGPT, Boris Cherny on Claude Code, and Nick Turley on ChatGPT.

The shift that happened

From Karpathy’s own experience, December 2024 marked a qualitative break. Agentic coding tools — Cursor, Claude Code — had been useful for a year, but required frequent correction. Then the corrections stopped. He kept asking for more; it kept working. The experience changed from “human-led with model assistance” to “model-led with human oversight.”

He framed this for his X audience as a signal about the nature of the change: not a speed-up, but a different relationship with the machine.

The framework: Software 1.0 → 2.0 → 3.0

Karpathy's three-paradigm framework explains the structural shift:

Software 1.0: explicit code — deterministic, engineer-authored.
Software 2.0: neural network weights — the program is in the training data distribution.
Software 3.0: natural language prompts — the LLM is the interpreter; anyone who can articulate can programme.

The engineering profession has crossed into Software 3.0. The tools (Cursor, Claude Code) are the early infrastructure.

What becomes possible that wasn’t before

Vibe Coding raises the floor: non-engineers can build functional software. This is not an acceleration of existing practice; it is a category expansion.

But the deeper change is what Agentic Engineering enables: professional engineers operating at multiples of previous throughput while maintaining quality. The 10× engineer ceiling is now far higher.

New capabilities also exist that had no prior implementation path. Karpathy’s LLM knowledge-base project: there is no traditional program that takes a document collection and compiles it into a wiki. An LLM can do this natively. (See LLM OS; Software 1.0, 2.0, 3.0.)

What determines where AI accelerates fastest

Verifiability: domains with automatic reward signals see the most RL investment and the fastest capability improvement. Maths and code are the canonical cases. Domains where verification is hard — aesthetics, long-horizon judgment — progress more slowly.

This is both a diagnostic tool (why is the model bad at X?) and a strategic lens (where should a founder build?). If your domain has verifiable structure, you can potentially unlock capability through fine-tuning that the base model doesn’t have.

What changes in engineering practice

Investment in tooling. The leverage from being excellent at agentic tooling (Claude Code, Cursor) dwarfs the leverage from most other skills. Same logic that applied to vim or VS Code, but with far higher ceiling.
Verification discipline. Faster code generation increases the risk of unreviewed changes accumulating. Security vulnerabilities from agent-written code are still your vulnerabilities.
Hiring needs rethinking. Puzzle-solving interviews select for the wrong skills. Better test: give a large project, deploy it, attack it with adversarial agents.

The long-horizon extrapolation

Karpathy’s “extremely foreign” endpoint: the neural computer — a device that takes raw audio/video, processes it through a neural network, and renders a UI customised to the moment. No traditional OS, no app layer. CPUs as co-processors; neural networks as the host process.

He acknowledges this is uncertain and the progression is “TBD.” But intelligence compute is already the dominant share of FLOPs, and the trajectory points toward this outcome piece by piece.

Boris Cherny: from inside Claude Code

Boris Cherny on Claude Code adds the practitioner’s ground-level account. Key additions:

Coding is virtually solved. For Boris’s own work, 100% of code has been AI-written since November 2025. He predicts this generalises to all codebases and tech stacks within months. The next frontier is not better coding but: (1) the model proposing what to build next (reading bug reports, telemetry); (2) non-engineering automation (Cowork).

The printing press analogy. Literacy went from <1% (scribes) to ~70% globally in the 200 years after Gutenberg. Coding is undergoing the same democratisation. Short-term disruption is real; long-term unlock is of comparable historical magnitude.

Role collapse. “Software engineer” as a title will begin disappearing in 2026; Boris expects it to be replaced by “builder.” His own team has ~50% role overlap between PM, engineer, and designer — all of whom code. Everyone will code; the question is what else they do alongside it.

The generalist advantage. Engineers who cross disciplines — product/infrastructure, strong design sense, business intuition, user empathy — will be disproportionately rewarded. Being AI-native is necessary but not sufficient.

Nick Turley: the interface hasn’t been built yet

Nick Turley on ChatGPT contributes a view from outside the engineering profession — the product perspective — that directly addresses the future shape of the field.

Chat is MS-DOS. Nick’s central contention: natural language is the right interface for AI, but the turn-by-turn chat paradigm is not the final form.

“ChatGPT feels a little bit like MS-DOS. We haven’t built Windows yet, and it’ll be obvious once we do.”

GPT-5 is already very good at front-end coding. There is no fundamental reason the model cannot render its own UI dynamically in response to context. The current chat box is the simplest possible interface, not the best. Nick is “baffled by how many people have copied the paradigm rather than trying out a different way of interacting with AI.”

Empiricism as the operating principle. Nick’s “maximally accelerated” forcing function encodes the same insight as Bitter Lesson in product form: you cannot reason about what to build until you have shipped and seen real use. The properties of AI products are emergent. Polishing before shipping means polishing the wrong things.

The model is the product. No meaningful distinction between model and product. This means product development discipline — user interviews, data science, experimentation — should be applied to the model itself, not only to the features around it. Engineers who can think across both layers will have structural advantage.