Cat Wu on AI Product

Cat Wu on AI Product

cat-wu anthropic claude-code product-management ai-product

Cat Wu on AI Product

Speaker: Cat Wu Host: Lenny Rachitsky (Lenny’s Podcast) Date: 2025 Source file: raw/Cat Wu.txt

An interview with Cat Wu, head of product for Claude Code and Cowork at Anthropic, on the mechanics of shipping at extreme velocity, how the PM role is evolving, and the craft of building products on top of rapidly improving models.


What this talk covers

  1. Cat Wu’s role — partnering with Boris (tech lead), cross-functional coordination at Anthropic.
  2. PM role evolution — timelines collapsed from 6 months to 1 day; emphasis on speed-to-ship over roadmap alignment.
  3. How to move fast — clear goals, research preview, tight engineering/marketing/docs launch loop.
  4. PRDs in the AI era — metrics readouts, team principles, PRDs for genuinely ambiguous or long-horizon features.
  5. Anthropic’s success ingredients — unified mission, focus, and willingness to sacrifice individual product KRs for Anthropic’s goals.
  6. Claude Code vs Cowork vs Claude web/mobile — which surface for which task.
  7. Cowork in practice — slide deck generation, inbox management, customer briefings for Applied AI team.
  8. Internal tool building — Claude Code enabling custom apps across the company (e.g., sales deck customiser).
  9. The model eating the harness — features added as model crutches are removed as models improve.
  10. New capabilities unlocked by new models — code review product’s journey to reliability.
  11. Product vision — task → multi-Claudeing → 50–100 parallel remote Claudes.
  12. AGI-pilled calibration — the hard PM skill: eliciting current-model capability, not building for the future model.
  13. Evals — 10 good evals are worth having; an underappreciated PM responsibility.
  14. Claude’s character — why personality is core to Claude’s product success.
  15. Career advice — automate repetitively, get automations to 100%, build things you actually use.

Key concepts

  • Product Taste — deciding what to build matters more than the ability to build it; the most durable skill as code gets cheaper.
  • Evals — concrete test cases measuring model/product behaviour; underappreciated by PMs.
  • Agentic Engineering — the “action-based” generation of AI products versus chat-based; Claude Code as the canonical example.
  • Vibe Coding — mentioned as what CLI users do when narrating tasks to Cursor/Claude Code.
  • Jagged Intelligence — patching model weaknesses with harness scaffolding; harness simplifies as models improve.

Key arguments

The model eats the harness

Features added to patch model limitations become unnecessary as models improve. The to-do list in Claude Code: added to force completion of all call sites when early models would stop after 5 of 20. With Opus 4+, the model naturally completes every item — the forced reminder can be removed. Every model launch, the team reads through the entire system prompt and removes what is no longer needed.

Inverse corollary: new models unlock entirely new features. Code review was attempted multiple times and never reliable enough to ship. Opus 4.5/4.6 and Sonnet 4.6 made it reliable enough that Anthropic’s own engineering team uses it as a merge gate. The strategy: build the prototype early, swap in the new model when it arrives, see if the gap closes.

Calibrated AGI-pilling

“It’s very easy to build the product for the super AGI strong model. The hard thing is figuring out for the current model, how do you elicit the maximum capability?”

The most valuable PM skill in AI-native products: knowing what the current model can actually do — not the future model. This requires: (1) spending extensive time talking with and using the model; (2) asking the model to introspect on unexpected behaviour (“why did you do this?”); (3) finding 5 trusted users who can articulate model/harness quality precisely; (4) writing evals.

Product taste as the durable skill

As code becomes cheaper to write, the scarce resource is knowing what to write. Product taste — which of 10,000 GitHub feature requests is worth building, and what is the right way to build it — comes from any background, but is rare. Engineering background currently adds one additional signal: a sense of how hard something is, which affects prioritisation.

Mission as a decision technology

Anthropic’s mission (“safe AGI for all of humanity”) functions not just as branding but as an operational decision protocol. When two priorities compete, the question “which serves Anthropic’s mission?” resolves the tie, and everyone stands behind the result — including willingness to deprioritise or sacrifice individual product lines. Cat Wu: “If Claude Code failed but Anthropic succeeded, I would be extremely happy.”

Automations must reach 100%

A 95%-reliable automation is not an automation — it requires monitoring and periodic correction, which erases the time savings. The last 5% requires investing in teaching the model your preferences, giving it feedback, and iterating until the success rate is genuinely 100%. Only then do you get real leverage.


Product surface map

SurfaceBest for
Claude Code CLIMost powerful; features land first; one-off or handful of coding tasks
Claude Code desktopFrontend work; preview pane; visual users; at-a-glance session management
Claude Code web/mobileKicking off tasks without a laptop
CoworkAny non-code output: decks, docs, summaries, inbox management, briefings

Relation to other talks