Verifiability
Verifiability is the property of a domain that makes it possible to automatically check whether a given output is correct. In the context of large language models, verifiability is the structural reason why AI capabilities cluster in maths, code, and logic rather than in aesthetics, judgment, or nuanced writing.
Why verifiability determines progress
The reinforcement learning (RL) training loop requires a reward signal: the model generates outputs, receives scores, and trains on what scored highly. A reward signal requires a verifier.
- Maths: run the computation, check the answer. Verifiable.
- Code: execute the program, check the output or run the tests. Verifiable.
- Chess: win or lose. Verifiable.
- Aesthetics: is this poem beautiful? No automatic verifier exists.
- Judgment: did this advice improve the person’s life? No automatic verifier exists (and the feedback loop is too long).
Labs build RL environments where verifiers are available and economically valuable. Where verification is hard, progress is slower — not because the domain is fundamentally harder, but because the RL infrastructure wasn’t built.
The chess example
Chess capability spiked between GPT-3.5 and GPT-4 — not because of general capability gains, but because someone at OpenAI added a large chess corpus to pretraining. A decision to invest resources produced a capability spike. This is verifiable capability: you can check if a chess move is good.
Implications for practitioners
Finding tractable opportunities: if you are in a domain with verifiable structure — where you could construct RL environments and evaluation datasets — you may be able to unlock capability through fine-tuning that the base model does not have out of the box. Verifiability makes the domain tractable, even if the frontier labs have not focused there yet.
Understanding gaps: arbitrary-seeming capability gaps (see Jagged Intelligence) often reflect the absence of RL investment in a particular domain. The gap is not fundamental — it reflects what has been built, not what is possible.
Almost everything can be made verifiable to some degree. Even writing can be evaluated by a council of LLM judges. The question is whether verification is easy or hard, not whether it is possible or impossible.