AI Safety Levels

AI Safety Levels (ASL) are Anthropic’s framework for categorising the risk posed by model capabilities, embedded in their responsible scaling policy. The framework defines what deployment and safety commitments are required at each level.

The levels

Level	Capability threshold	Risk characterisation
ASL-1	Below GPT-2	Negligible risk
ASL-2	Basic capability; early GPT era	Limited risk; standard safeguards sufficient
ASL-3	Meaningful uplift for creating weapons of mass destruction	Moderate risk from misuse; current Claude models (2025)
ASL-4	Potential for significant loss of human life from misuse	High risk; requires substantially stronger safety measures
ASL-5	Potential extinction-level outcomes if misaligned or misused	Catastrophic risk; may require halting development until safety proven

ASL-3 — current status

As of 2025, Anthropic classifies Claude models at ASL-3. The concrete evidence: in controlled expert evaluations, Claude provides measurable uplift to a bad actor seeking to create a bioweapon — above the baseline set by Google Search. Anthropic testified to Congress about this capability.

The ASL-3 designation triggers specific safety commitments: enhanced red-teaming, additional deployment restrictions, and published policy.

Purpose of the framework

Operationalises safety claims. Instead of “we care about safety,” ASL provides concrete commitments tied to measurable capability thresholds.
Enables responsible scaling. Anthropic can continue developing models while publishing the conditions under which they would need to pause.
Builds trust with policymakers. Publishing both the risks and the commitments gives legislators a concrete basis for evaluation.

The “God in a box” problem

Ben frames ASL-4 and ASL-5 in terms of historical AI safety theory: early concern was about keeping a superintelligent system contained and aligned. The irony of language models: people are actively pulling the “God out of the box” (giving models full internet access, credentials, broad agency). The ASL framework is Anthropic’s attempt to gate capability deployment against safety maturity.

Where mainstream views differ