Mar 4 / James Kavanagh

The real work that comes after ISO42001 Certification

The gap between compliance documentation and genuine system-level assurance is where AI governance is falling down. Other industries learned this lesson the hard way.

Dame Judith Hackitt's subsequent independent review laid bare what had gone wrong. Not a single rogue failure, but something more systemic. She described a pervasive "tick-box" culture across the entire building safety regime, one that placed compliance above real-world safety. Organisations were treating minimum standards as a high bar to negotiate down rather than genuinely owning the safety of their buildings. The regulatory system checked that processes were followed. It didn't require anyone to make a genuine, evidence-based argument that this specific building was safe.

Here's my blunt opinion: ISO 42001 certification is the start line, not the finish line. It means you've started the work of building the organisational machinery. It doesn't mean the machinery is doing anything useful yet. And over-indexing on regulatory compliance, treating something like EU AI Act conformity assessment as a goal rather than a baseline, is just the same trap. The real work is maintaining genuine, evidence-based assurance that each of your AI systems is doing what you claim it's doing, in conditions that keep changing. That work starts after certification, after compliance.

Not maliciously performative. Nobody sets out to write fiction. But the messy reality of how security or resilience is actually achieved inside a hyperscale cloud provider, the informal knowledge networks, the incident response muscle memory, the engineering judgment calls that happen in the middle of the night, the rapid response mechanisms. They bear very little resemblance to the neat process descriptions that satisfy an ISO 27001 auditor. The documentation describes a world that is cleaner, more linear, and more controlled than the actual world. It has to, because the standard demands a level of prescriptive tidiness that complex, fast-moving environments don't naturally produce.

Now apply that same dynamic to AI governance. Organisations are building AI governance documentation that describes how they manage risk, how they assess impacts, how they monitor systems. Some of that documentation reflects genuine practice. Some of it describes aspirational processes that don't yet exist in any meaningful way. And some of it is written specifically to satisfy the structure of the standard, because the standard asks for things in a format that doesn't match how the organisation actually works. The auditor checks the documentation. The certificate gets issued. The gap between the documented governance and the operational reality remains invisible.

The foundational work here is Clymer et al.'s 2024 paper, "Safety Cases: How to Justify the Safety of Advanced AI Systems." They propose four categories of safety argument: inability (the system simply can't cause the harm), control (even if it could, mitigations prevent it), trustworthiness (the system wouldn't attempt harm even if capable), and deference (for very powerful systems, using a credible AI overseer). This taxonomy is useful because it maps onto how confidence in safety degrades as capabilities increase. You start with inability arguments, which are the strongest and simplest, and progressively need more sophisticated arguments as models get more capable.

Buhl, Sett, Koessler, Schuett, and Anderljung at the Centre for the Governance of AI built on this in their October 2024 paper, exploring how safety cases could function in both industry self-regulation and government oversight. They sketch how a safety case could inform major deployment decisions, with a designated safety case team producing the argument, an internal review team challenging it, and leadership making the call. That's a fundamentally different decision architecture from "the risk assessment says medium, ship it."

The most revealing example is Anthropic's own work. Roger Grosse published three sketches of what safety case components might look like for ASL-4 level systems, covering mechanistic interpretability, AI control, and incentives analysis. What's particularly honest about this work is the explicit acknowledgment that none of the sketches fully succeeds. The scenario assumes a model where developers can't rule out the possibility that it could cause a catastrophe, and also can't rule out that the model could strategically sandbag evaluations or undermine monitoring. That's not a great starting position for building a safety argument, and they're transparent about the gaps.

Separately, Korbak, Clymer, and colleagues published a control safety case sketch in early 2025, focused on a concrete scenario: arguing that a hypothetical LLM agent deployed internally won't exfiltrate sensitive data. The safety argument rests on three claims: the red team adequately elicited model capabilities, control measures remain effective in deployment, and you can conservatively extrapolate from test to production. It's narrow and specific enough to actually evaluate, which is exactly the point.

The UK's AI Security Institute has been doing complementary work, publishing a safety case template for "inability" arguments as a proof of concept. They're upfront that this only covers a subset of relevant arguments and that a full safety case would also need sociotechnical arguments about organisational safety culture and staff competence. That last point matters. It's an explicit acknowledgment from the institution most engaged in frontier AI safety that technical safety cases alone aren't sufficient without the organisational layer.

First, what specific claims are we making about this system's behaviour? Not vague aspirations like "it's fair" or "it's responsible", and almost as bad, that "It's in scope of our ISO42001 certification". Operational claims, like "This system does not produce materially different approval rates across protected demographic groups, measured by disparate impact on all tracked dimensions of >0.9"
Second, what evidence do we have that those claims hold, and how current is that evidence? So often, if an impact assessment was even done, it was before deployment, eighteen months ago. The bias testing was run on the training data, not on production outputs. The monitoring dashboard exists but nobody reviews it. The evidence that once supported the claims has gone stale, and nobody noticed because the management system doesn't require anyone to maintain the argument.
Third, what conditions would cause those claims to stop holding, and how would we detect that? This is the most important piece. It forces you to think about the failure modes of your assurance, not just the failure modes of your system. Distribution shift. Changed user population. Model updates. New regulatory requirements. Upstream data quality degradation. If you haven't identified the conditions under which your assurance degrades, you have no way of knowing when your governance has stopped working.

I recognise there's a cultural shift here too. Management system compliance distributes governance responsibility into process roles. The risk owner ticks the box. The auditor checks the box was ticked. Management reviews the summary. It's kind of nice and convenient to diffuse accountability - when things go wrong, it was a process failure. But an assurance argument concentrates accountability around a specific claim about a specific system. Someone has to own it. In Amazon we called it single-threaded ownership. You stand behind it and say "I believe this system is operating acceptably, and here's why." That's such a fundamentally different posture from "we followed the process."

1. Grenfell Tower Inquiry. https://www.grenfelltowerinquiry.org.uk

2. Dame Judith Hackitt, "Building a Safer Future: Independent Review of Building Regulations and Fire Safety" (2018) https://www.gov.uk/government/publications/independent-review-of-building-regulations-and-fire-safety-final-report

3 Clymer, Gabrieli, Krueger & Larsen, "Safety Cases: How to Justify the Safety of Advanced AI Systems" (2024). https://arxiv.org/abs/2403.10462

4 Buhl, Sett, Koessler, Schuett & Anderljung, "Safety Cases for Frontier AI" (2024), Centre for the Governance of AI. https://arxiv.org/abs/2410.21572

5 Grosse, "Three Sketches of ASL-4 Safety Case Components" (2024), Anthropic Alignment Science. https://alignment.anthropic.com/2024/safety-cases/

6 Korbak, Clymer, Hilton, Shlegeris & Irving, "A Sketch of an AI Control Safety Case" (2025) https://arxiv.org/abs/2501.17315

7 UK AI Security Institute, "Safety Case Template for Inability Arguments" https://www.aisi.gov.uk/blog/safety-case-template-for-inability-arguments

8 UK AI Security Institute, "Safety Cases at AISI" — https://www.aisi.gov.uk/blog/safety-cases-at-aisi

9 Cârlane & Gomez, "Dynamic Safety Cases for Frontier AI" (2024) https://arxiv.org/abs/2412.17618

10 Simon Mylius, "Systematic Hazard Analysis for Frontier AI using STPA" (2025) https://arxiv.org/abs/2506.01782

11 Anthropic Responsible Scaling Policy https://www.anthropic.com/responsible-scaling-policy

The real work that comes after ISO42001 Certification

What a management system actually tells you

The safety case tradition

What the frontier AI safety case research tells us

The gap that matters for practitioners

From compliance to genuine assurance

Proportionality and Pragmatism

The Race Ahead

References