May 27 / James Kavanagh

How I Fix Broken AI Governance

A practical approach to identifying the causes of governance problems, and then design the right fixes

There's a story about Taiichi Ohno, the engineer who built the Toyota Production System. I don't know if it's true or not, but I like it. Apparently, he used to draw a chalk circle on the factory floor and make a manager stand in it. Then he'd leave, sometimes for hours. And the manager's job was just to watch. No clipboard, no checklist, no framework, nothing. Just observe how work actually flows, where things get stuck, where people have invented workarounds that aren't in any process document, and where the procedure so carefully drawn out and posted on the wall bears no resemblance to what's actually happening in front of them.

If you're entering the AI governance field as a career, you might imagine your work will be designing an elegant governance framework from the ground up. Or perhaps auditing sophisticated AI management systems against well-defined standards. Or perhaps navigating obscure procedures of legislation like the EU AI Act to classify a system as "high-risk" whether or not it poses any risk at all. Or designing and implementing new technical controls for evaluation of models and guardrails for their safe execution. Maybe you will.

But more likely, your experience will be quite different from all of these, and honestly much more interesting and meaningful. Because if you stop, if you stand in the chalk circle, and observe, what you'll usually find is a bit of a mess and a puzzle. You'll see things that sort of work, things that were designed for different purposes, things that exist on paper but not in practice, and things that do happen in practice but not the way they exist on paper. And your job becomes figuring out how to get it all working effectively, bit by bit. I find that's the real practical skill of AI governance. It doesn't start with mapping the EU AI Act, issuing an audit report, implementing a GRC tool or writing a Pulitzer-prize winning Policy. It starts with standing in the circle and watching.

Let me go through a few. Maturity models are everywhere. You rate each capability 1-5 across dimensions, produce a spider chart, present it to the board. Executives love them because they're visual and they track year-on-year. Bonus points if you draw sparklines on the spreadsheet. The problem is that a "Level 3" in incident management tells you nothing about what's actually broken. You get a score without a diagnosis. It's like a doctor telling you your health is a 6 out of 10 and sending you home.

Clause-by-clause gap analysis against a standard is probably the most common method. Take ISO 42001, go through every clause, mark each as conformant, partially conformant, or non-conformant. This seems rigorous and it has its place. But it operates at the control level and only superficially at that, assessing whether a control is claimed to be in place or not, whether there are some sampled artifacts as 'evidence'. It doesn't really get at the work. You might mark incident management as "partially conformant" but that finding doesn't tell you whether the tooling is wrong, the ownership is missing, or the culture is resistant.

GRC platform-driven approaches give you a dashboard and a compliance tracker. You import the control framework, assign owners, click through a few wizards, collect evidence. Present the sparkline-infused dashboard like magic. The tool structures the work, and that has value. But it doesn't teach anyone how to diagnose whether a control is actually functioning. You end up tracking whether someone uploaded a document, or ticked a box, not whether a mechanism is working to meaningfully improve outcomes.

Then there are much more technical and complex approaches: failure modes analysis, five-whys, threat modelling, red-teaming, simulation, safety cases, and even system theoretic process analysis (if you want to bring the big guns). These have analytical power, they get at detail and some at least have deep roots in safety science and engineering. They're fun if you geek out on safety science. But they're specialist tools that require specialist skills, and most organisations adopting AI governance just don't have the expertise, time or the budget to deploy them broadly. They have their place, particularly for high-risk systems, and as a pracititioner you want a variety of tools in your toolbox, but they're not where most governance practitioners will start their diagnostic work.

My honest assessment is that most of these partially work (except questionnaires - they're useless). Gap analysis and audit approaches are rigorous but stop at "there's a gap." Maturity models give you a communication tool for leadership but not a diagnostic tool for practitioners. GRC platforms give you tracking but not insight. Questionnaires give you self-reported opinions, not observed reality. Sophisticated technical approaches are expensive and narrow. And some of these unfortunately devolve into theatre, ticking boxes to satisfy a certification body or a board paper, without anyone asking whether the governance is meaningful.

That article focused on the design of mechanisms. But before you can design mechanisms, you need to assess what already exists. And that assessment is a craft in itself. It's diagnostic work, and it takes time to develop a method that works for you. That method (or at least my method) is what I want to share in this article, because I think it's one of the skills that can help governance practitioners produce real value and avoid the trap of producing impressive-looking frameworks that don't create value or sustain.

So imagine you're in a mid-sized company in an industry like education or healthcare, and they've been building and deploying some software systems for their own use and to sell. They've been operating for a few years. They have an IT incident process that handles production outages and security events. It works reasonably well for those purposes. They hold ISO 27001 certification, maybe SOC2, so the process was built to satisfy information security requirements. And they need those because, operating in health or education, they have sensitive personal data and compliance requirements to meet.

And now they're building AI into those systems, consuming AI from external service providers too. Most employees are using some AI tools in their daily work whether the company knows about it or not. And AI governance lands on the agenda. The EU AI Act requires serious incident reporting for high risk systems. ISO 42001 requires nonconformity and corrective action processes. NIST AI RMF expects incident response and documentation. So you start with compliance mapping or start with risk identification, whatever way (and that's a topic for another article), you recognise that they need controls in say the incident management domain. And those controls need operational mechanisms behind them.

Nothing exists. The control is on paper, but there's no operational reality behind it. Nobody has built anything for real. This is a straightforward gap and the easiest to diagnose, though not necessarily the easiest to fix.
Informal practice. This is by far the most common finding, and it's the one that matters most to understand. People are doing things. Engineers respond to production issues. Someone handles client complaints about AI outputs. There are habits, Slack channels, people who get called. But none of it was designed as a governance mechanism. Instead, it just grew from operational necessity. It overlaps with what the control requires, but usually only partially and is often quite rigid.
A mechanism exists but its scope doesn't match. The organisation has intentionally built something, but it was built for different purposes. That IT incident process designed for ISO 27001 handles system outages and security breaches. It was never extended to cover AI-specific failure modes like bias drift, safety degradation, or adversarial misuse. So sure, the mechanism is real, but it just doesn't cover what the AI governance control requires.
A mechanism exists and it's properly scoped. Celebrate, go to step 2! Sadly, at least in my experience, this is the least common finding, especially when organisations first start governing AI seriously.

But then you have to look at what it doesn't cover. Someone reports that the AI-powered university admissions consistently recommends less challenging courses to students from certain demographics. A clinician notices that a patient triage tool seems to deprioritise referrals from a particular postcode. These are likely AI incidents. They may be serious incidents under the EU AI Act's definition. But they don't look like outages, they don't trigger monitoring alerts. They arrive as informal complaints and somebody just handles them informally, or not at all. You only see this by pausing and carefully watching. Standing in the chalk circle.

Inputs. These are about what triggers the mechanism and what information feeds it. There are likely existing channels that work for what they were built for. Monitoring detects system outages. Client reports come through support tickets. Engineers escalate production issues. But none of these channels were designed to detect AI-specific failure modes. Bias drift doesn't generate a monitoring alert. A safety safeguard degrading over months doesn't trigger an escalation. A data quality issue corrupting scores without producing errors looks like normal operation from the inside. The inputs that exist are ok. They just don't cover the territory the control addresses. That's a design issue.
Outputs. What the mechanism produces and who consumes it. So the informal process I described produces fixes. Something breaks, someone fixes it, everyone moves on. But you have to go back and look at what the governance controls require. Structured incident records with classification data. Root cause analysis docs. Notifications that contain the information Article 73 of the EU AI Act specifies for serious incidents. Corrective action plans that feed into improvement cycles. The informal process most likely produces none of this. And think about the knock on effects, like how the outputs of one mechanism become the inputs to other mechanisms downstream. If incident management doesn't produce structured corrective action plans, whatever mechanism is supposed to track corrective actions has nothing to consume. Poorly designed outputs in one mechanism starve every mechanism downstream. Design issue again.
Tooling. So let's say the company has a general IT ticketing system. It handles what it was built for. But look at what the AI governance controls need. Incident classification that distinguishes a bias-related incident from a service degradation. Templates that capture the fields a regulator requires. Workflow routing that sends a safety incident to a different response team than a performance glitch. The ticketing system isn't configured for this - it probably could be, it just hasn't been designed for. Design issue. Now you have to play out why this matters beyond the tooling component itself. You see the strongest form of adoption is architectural, where using the mechanism is easier than not using it. And if the tooling required AI-specific classification before a ticket could progress, incidents would be classified correctly by default. But the tooling doesn't support that, which means architectural adoption for AI incidents is impossible. A design gap in tooling creates an adoption constraint. It's a design issue with knock on effects to other components in the mechanism - so this is most likely an issue to fix first.
Ownership. Alright, let's assume an engineering lead handles incidents when they occur, and they're good. But nobody designed incident management ownership into the governance structure, so it's not in their objectives, they've no time allocated for it. They just use their personal relationships to pull together cross-functional teams for post-incident review. So there's an owner, but they're not fully equipped to properly own the mechanism. My ownership test is pretty simple: I think genuine ownership requires accountability (as in your performance is assessed on whether the mechanism works), authority (you can direct the resources and people it requires), and capacity (you have the time and bandwidth). Remove any one of these and ownership is nominal, not real. You guessed it, design gap.
Adoption. This is where something interesting happens. So for production outages, the informal process is adopted. The system is down, clients are affected, people mobilise. That works. But that doesn't work for AI incidents for the reasons I discussed under tooling. This selective adoption is one of the hardest patterns to diagnose because it looks like the mechanism is working. If you count tickets, the numbers look cool. The gap is in what never becomes a ticket. And the gap is both a design problem (the tooling doesn't support it, the inputs don't detect it) and an adaptive problem (nobody thinks of these observations as "incidents" in the first place). Think about it this way: if the tooling was fully capable, but people still wouldn't use it because they don't think a drifting AI recommendation is an incident, then that's not just a design issue. Both an operational and a design issue here.
Inspection. Ok, so clearly here we have a problem - without tooling and adoption, nobody can tell you how many AI-related incidents were detected. Nobody tracks whether detected anomalies actually became incident tickets. Nobody measures whether triage classifications were accurate, whether the same root causes are recurring, how long it takes from detection to resolution, or whether regulatory notification deadlines were met. There are no leading indicators telling you whether the mechanism is running. There are no lagging indicators telling you whether it succeeded. And without indicators, there are no targets to measure against. Design issue.
Continuous improvement. And this is where everything connects. The improvement cycle depends on inspection data. Indicators deviate from targets. Those deviations signal that something needs to change. The change could target any component: update the inputs, redesign the outputs, extend the tooling, strengthen adoption, tighten inspection itself. The mechanism owner does that inspection and drives the improvement continuously. They're not waiting for some line in a audit report once a year. They're continuously monitoring, reporting and adjusting or fixing. That's how a mechanism adapts over time. But without inspection, the improvement cycle has nothing to work with. No indicators means no deviations. No deviations means no signal. No signal means no improvement. The same categories of incident recur, and each time they're treated as new. The mechanism also can't absorb regulatory change. When a delegated act clarifies Article 73 reporting requirements, a mechanism with a functioning improvement cycle absorbs that as routine input. A mechanism without one requires a special project every time. Given the pace of regulatory development right now, that doesn't scale. Design gap.

Now when I'm finished mentally working through all those seven components, I step back from the component-by-component findings and look at the profile as a whole. In this case, almost every finding is a design gap. Some components have operational presence but are designed for the wrong territory. Others were never designed at all. The people may be competent, sometimes through heroics it works. But the channels that exist function within their designed scope and it's just not designed for the governance territory.

I can't stand vague "needs improvement" findings, even if they're specific they're often cherry-picked or almost random about what is found and usually just say what the control gap is without any clear view on what specific change is needed. But in this method, at least you now have a list of specific component-level issues, that are hopefully actionable. In our example, it might tell you that inputs need extending to detect AI-specific failure modes, that outputs need restructuring for multiple consumers, that tooling needs AI-specific classification, that ownership needs formalisation with authority, that adoption has both a design problem and an operational training problem, that inspection needs to be built from the ground up, and that continuous improvement has no signal to work with until inspection exists. Each of those points to a different kind of work. Which brings me to step #3 - what kind of issue is it.

Other gaps require adaptive intervention. They need people to change how they think about the work. Adoption is selective because people don't see AI-specific incidents as "real" incidents. That's a mental model problem. Ownership is informal because nobody wants accountability for a process that crosses functional boundaries. That's an organisational politics problem. Post-incident reviews don't happen because the culture treats reporting as blame rather than learning. That's a values problem. No amount of tooling or process design fixes any of these issues.

Applying a technical fix to an adaptive issue is the single most common mistake in compliance improvement (at least in my distinctly biased, personal experience). You buy the platform. Nobody uses it. The dashboards go unread. Six months later someone asks why incident management hasn't improved, and the answer is that the problem was never the tool. The problem was that nobody believed AI-specific incidents warranted the same response as system outages. There's no procurement decision or tool implementation that solves that.

So at this stage, you have a big long list of specific issues. They form a component-level map, each classified by type, each pointing to a specific kind of intervention. The mechanism owner (if they exist) owns all those issues. The natural instinct at this point is to design some form of transformation program. A big piece of work with a project plan, a steering committee, a timeline measured in quarters, and a target state that represents "done." Looks pretty, consultants love it, and ... it's mostly pointless (IMHO :) ).

This is Ohno's legacy again. The Toyota Production System advocates against a single transformation program. It works on the idea of thousands of small improvements made by the people doing the work, each one informed by what they observed in the chalk circle. Governance improvement works the same way. Add AI-specific classification fields to the ticketing system this sprint. Update the triage criteria next week. Run a single tabletop exercise with the response team the week after. Formalise the ownership role in next month's objectives cycle.

Second, and more importantly, it puts improvement in the hands of the mechanism owners. The engineering lead who handles incidents, the data science team lead, the operations manager. These are the people who understand the operational reality. They know where the friction is. They know which changes will be adopted and which will be ignored. When they own the improvement, the changes stick. When a central team designs improvements and hands them over, the adoption issue we just diagnosed will show up all over again in a different form.

Don't get me wrong. This doesn't mean there's no coordination. Someone needs to hold the diagnostic map, track which gaps are being addressed, and make sure the sequencing makes sense. Someone needs to manage budget and Technical fixes that enable architectural adoption should usually come early, because they reduce the adaptive burden. Adaptive work starts in parallel, because cultural shifts and mental model changes take longer than platform configurations. Inspection gets established as soon as possible, even in rudimentary form, because it provides the data that tells you whether your interventions are actually working.

So that's my method, hope it helps to share it. Genuinely, I think the real work of AI governance starts in the chalk circle. You stand in it. You observe what's actually happening, not what the policy says should happen, or the textbook said would happen. Then diagnose: what exists, how well it's scoped, where the components are strong and where they fall short, whether each issue needs a technical fix or adaptive work. And then improve, in really small continuous increments owned by the people who run the mechanisms.

This kind of diagnostic thinking is central to what I teach in the AI Governance Practitioner Program, and especially in the upcoming Compliance and Risk speciality courses. That's not because I think everyone needs my specific approach. Practitioners create their own craft. But I think the practice of assessing and improving governance mechanisms is the skill that lets practitioners enjoy making a real difference rather than succumb to the frustration of governance theatre. None of us enjoy that particular show.

How I Fix Broken AI Governance

Approaches to diagnosis

How thinking in adaptive mechanisms can help diagnose and fix your broken governance

Step 1: Does a mechanism exist?

Step 2: Is the mechanism working?

Step #3 What kind of issue are we seeing?

Step #4 Plan and do your interventions

Wrapping Up

How I Fix Broken AI Governance

Approaches to diagnosis

How thinking in adaptive mechanisms can help diagnose and fix your broken governance

Step 1: Does a mechanism exist?

Step 2: Is the mechanism working?

Step #3 What kind of issue are we seeing?

Step #4 Plan and do your interventions

Wrapping Up

Subscribe to our newsletter!