Scott Wu on Devin and the Future of Software Engineering

Guest: Scott Wu — Co-founder and CEO of Cognition; previously founder of Lunchclub (AI professional networking). Competitive programming background; co-founders include alumni of Scale AI and Cursor.
Host: Lenny Rachitsky
Source: Lenny’s Podcast. Recorded ~early 2025 (approximately one year after Devin’s launch in early 2024).

Overview

Scott Wu tells the origin story of Devin — the world’s first fully autonomous AI software engineer — and describes the team’s philosophy on the future of engineering. Covers the RL paradigm shift that made Devin possible, the async-first workflow pattern, eight product pivots before product-market fit, and the “bricklayer to architect” framing for how engineer roles evolve.

Key ideas

Devin as an async junior engineer. Devin works end-to-end autonomously: accepts tasks via Slack, Linear, or GitHub; opens PRs; merges. Cognition’s 15-person team runs up to five Devins each simultaneously. As of recording, ~25% of Cognition’s PRs are from Devin; they expect >50% by year end.
RL as the paradigm shift. The first ChatGPT era was imitation learning (read the internet, talk like the internet). Devin is built on high-compute RL: model does work → code runs → execution feedback → model learns. Code is uniquely well-suited because the feedback loop (run the code) is automated and unambiguous.
Bricklayer to architect. As Devin handles implementation, engineers shift from writing code to specifying intent: defining problems, thinking through architecture, scoping tasks clearly, reviewing output. The soul of engineering — telling your computer what to do — remains; the how increasingly belongs to the agent.
Jagged intelligence applied. Devin’s capability profile is non-uniform: some tasks it handles better than senior engineers; others require heavy human steering. Knowing which category a task falls into is itself a skill that Cognition’s team had to develop.
Async-first workflow. The right Devin usage pattern: parallelise across five tasks simultaneously; kick off and check in only when steering is needed; review output rather than watching step-by-step. This is a different working style from synchronous pair-programming tools.

Devin product overview

Interface. Slack integration (tag Devin on a thread); Linear (assign Devin to issues); GitHub (Devin opens PRs directly). Not a chatbot — a remote-engineer-flavoured workflow.

Capability level. At launch (~early 2024): described as a high school CS student. Over the year: college intern → junior engineer. “Jagged” in that some specific capabilities (code search, documentation, routine refactors) exceed human pace; complex architectural reasoning still requires human steering.

Knowledge accumulation. Devin builds institutional knowledge across sessions from all team members. The Devin Wiki feature indexes the full codebase with architecture diagrams, and allows natural-language queries about how code components interact.

Async usage pattern. The productive workflow Cognition settled on:

Plan five tasks for the day
Kick off Devin instances in parallel on all five
Check in at the points where expert judgment is needed (scope definition, architectural decisions, front-end QA)
Review and merge

RL and the code-as-training-ground thesis

Code is uniquely good training data for RL because the feedback loop is automated: run the code, test it, get a pass/fail signal. This is unlike natural language, where evaluation requires human judgement or proxy metrics. Scott’s thesis at founding: high-compute RL applied to code would produce a step-change in agent capabilities. Devin was the application of that bet.

Contrast with imitation learning (ChatGPT era): train on text, predict the next token. Good for encyclopaedic knowledge and conversation. Insufficient for multi-step autonomous task completion. RL adds the ability to iterate over failures and update accordingly.

Eight pivots

Cognition went through approximately eight product pivots within the coding-agents space before landing on Devin’s current form. The stable anchors throughout: (1) coding, and (2) agents. The variable: how exactly the agent interfaces with teams, what tasks it owns, and what the human handoff looks like.

Bricklayer to architect

The transition Scott describes parallels Michael Truell's "logic designer" framing: as AI handles implementation, the engineer’s value moves to problem definition, architecture, scoping, and review. The difference in framing: “bricklayer to architect” emphasises the managerial/director aspect of the new role; “logic designer” emphasises the specification-writing aspect.

Both framings agree on the core claim: coding in the Python/C++/JavaScript sense becomes progressively less central; the ability to tell your computer what to do (at a higher abstraction level) remains the core skill.