Nick Turley on ChatGPT

Nick Turley on ChatGPT

transcript chatgpt openai ai-product product-management consumer-ai lenny-podcast

Nick Turley on ChatGPT

Guest: Nick Turley, Head of ChatGPT at OpenAI
Host: Lenny Rachitsky (Lenny’s Podcast)
Date: 2025 (recorded the week GPT-5 launched)
Source: raw/lenny/Nick Turley.txt
Note: Nick’s first major podcast interview.


Overview

Nick Turley joined OpenAI when it was still a research lab, started by fixing blinds and sending NDAs, stumbled into product work, and ended up running ChatGPT from 0 to 700 million weekly active users. This episode covers the full arc: the hackathon origins, the product philosophy that shaped ChatGPT, how OpenAI thinks about safety and sycophancy, and where the product goes next.


Origins

ChatGPT came from a hackathon. OpenAI wanted a direct consumer feedback loop — the developer API was making fast iteration difficult (model changes broke apps) and the feedback was disintermediated. A volunteer team from across OpenAI spent a couple of months prototyping bespoke ideas (meeting bots, coding tools). Each time, users tried to use the prototype for everything else the model could do. The team concluded they needed to ship something fully open-ended.

The original name was “Chat with GPT-3.5.” Changed to ChatGPT the night before launch. The product went live just before the holidays; the team expected to wind it down after learning from early data. Instead, it retained.

The codebase is still called “SA Server” — Super Assistant Server, the original vision name.


The model is the product

“There really is no distinction between the model and the product. The model is the product.”

The implication: apply product development discipline — user interviews, data science, iteration — to the model itself, not only to features. The team systematically identifies which use cases people rely on and improves the model specifically on those.

A dedicated model behaviour team works on personality and tone separately from capability work. GPT-5’s improved “taste” in writing is the result.

Retention, roughly thirds:

  1. Model improvements on the use cases people care about.
  2. Capability-level features with research components (search removed the knowledge cut-off; memory/personalisation builds context over time).
  3. Classic product work (removing the login requirement, UI cleanup, standard growth levers).

Pace and urgency

“Is it maximally accelerated?” — Nick’s forcing-function question. In teams from larger companies accustomed to “let’s circle back next quarter,” it cuts to: if this were the most important thing, what would change? Became a pink Comic Sans Slack emoji.

The deeper principle is epistemological, not competitive:

“You’re going to be polishing the wrong things in this space. You won’t know what to polish until after you ship.”

With AI, the product’s properties are emergent and cannot be reasoned about in advance. This means:

  • Shipping is the beginning of iteration, not the end.
  • Polish is not wrong — it’s premature when you don’t yet know what matters.
  • Once you know what people are doing, there is no excuse to not polish. “You better follow through.”

Nick’s personal rhythm: one full unplugged thinking day per week. “Otherwise it’s just not possible.”


Sycophancy

An update pushed the model to give responses that sound good in the moment. The team caught it, took it down, published a full retrospective.

Structural argument for why OpenAI can avoid this trap — and why others may not:

“Show me the incentive and I’ll show you the outcome.”

A subscription model with no time-in-product incentive structurally aligns against sycophancy. ChatGPT is optimised to help users thrive and achieve goals. Maximising engagement — the incentive of attention-economy products — would produce the opposite.

Response: sycophancy is now a measured metric tested with every model release. GPT-5 improved on this dimension. Nick published a blog post articulating what ChatGPT optimises for. See Sycophancy.


Run towards high-stakes use cases

Tech companies at scale tend to disable high-risk use cases (medical, relationship, mental health) to minimise liability. Nick argues this is a failure of duty:

“If you have a model state of the art on health benchmarks and you didn’t use that to help people, I feel like we would have immense regret.”

The right response to a high-stakes use case is not to block it but to:

  1. Talk to domain experts.
  2. Document where the model breaks down.
  3. Communicate limitations clearly.
  4. Design the model’s response appropriately (e.g., for “should I break up with my boyfriend,” help the user think through it rather than answering directly — as a thoughtful companion would).

ChatGPT is saving marriages. People process emotions, get communication feedback, have a companion for difficult conversations. The product is increasingly a daily-life tool, not primarily a productivity tool.


Chat is MS-DOS

Nick agrees with Kevin Weil‘s point that natural language is the right interface. He disagrees that the chat turn-by-turn paradigm is the long-term form.

“ChatGPT feels a little bit like MS-DOS. We haven’t built Windows yet, and it’ll be obvious once we do.”

AI should be able to render its own UI — GPT-5 is already very good at front-end coding. The chat box is the simplest interface to have shipped in 2022; the product will evolve beyond it. Natural language yes; chat turn as the definitive paradigm, no.

Nick is “baffled by how many people have copied the paradigm rather than trying out a different way of interacting with AI.”


No-waitlist decision and emergent use cases

Launching to everyone at once (no waitlist, unprecedented for OpenAI) created a public moment where millions of people discovered use cases simultaneously and shared them. ChatGPT skipped the “empty box problem” horizontal tools like Notion/Airtable faced because so much learning happened outside the product — TikTok threads with 2,000 use cases in the comments.

Nick’s user research approach: back-to-back 15-minute user interviews; stop when you can predict what the next person will say. At ChatGPT’s scale, he built a data science team and conversation classifiers to automate use-case tracking at volume.


GPTs and the app-store direction

GPTs (custom configurations) are “ahead of their time” in the consumer space — not enough differentiation possible yet. More traction in enterprise, where companies have unique data and bespoke business processes.

Vision: allow people to start a business on ChatGPT, using it for distribution the way businesses used the early internet. More to come as models improve and ChatGPT approaches a billion users.


Hiring principles

OpenAI inherits the research lab norm: every person matters; run lean; treat hiring as seriously as research.

Nick’s approach:

  • Gap-first hiring. Understand the specific skill gap on each team before deciding what role to fill. Sometimes the team doesn’t need a PM — an engineering leader has product sense already.
  • Maximise barrels. A barrel (Keith Rabois’s term) is someone who can make things happen end-to-end. Throughput scales with the number of empowered people, not headcount.
  • Curiosity over credentials. For non-research functions, curiosity about how the technology works is a better predictor of success than prior AI experience. Filtering for “has done this before” is filtering for luck.

What makes OpenAI successful — inherited principles

Three things Nick attributes to OpenAI’s success, inherited from its research-lab culture:

  1. Empiricism. You can only find out by shipping. Maximally lean into this.
  2. Ideas come from anywhere. Don’t gatekeep or centralise prioritisation. Empower smart people across all functions.
  3. Interdisciplinarity. Research, engineering, design, and product together — not siloed. A practical litmus test: if a feature doesn’t get 2× better as the model gets 2× smarter, it’s probably not a feature worth shipping.

Accidental decisions that became consequential

  • Free at launch — GPT-3.5 API had been available for 6 months; anyone could have built a similar product. Making it free and wrapping it in a clean UI was the decisive differentiator.
  • Subscription at $20 — not a strategy; a panic: used the Van Westendorp pricing survey found via Google. The purpose was to “turn away demand” from less serious users. $200/month tier came later: vehicle for shipping the most powerful research (o3 Pro, GPT-5 Pro) to people who really want it.
  • No-login friction removed — should have happened earlier; was constrained by infrastructure. When they did it, it was a “huge hit.”
  • Shipping Code Interpreter rough — learnt enormous amounts about real use cases post-ship. Now called Data Analysis.

Key concepts

  • Sycophancy — measured metric, business-model-structural argument
  • Evals — the PM’s lingua franca for communicating with research
  • Latent Demand — emergent use cases; cannot reason in advance
  • Bitter Lesson — don’t over-polish before you know what matters
  • Agentic Engineering — the broader context for agentic AI
  • Tool Use — action space for the “super assistant” vision
  • Scaling Laws — Nick and OpenAI “believe in the exponential”

See also