Mar 11 / James Kavanagh

The EU AI Act. I'm convinced it was written by Kafka

As an engineer reading the EU AI Act, I see so many requirements that look rational in isolation, but put together are incoherent and disconnected from the reality of AI systems in use.

In The Trial, Josef K. wakes up one morning to find himself under arrest. No one tells him what he's charged with. He spends the rest of the novel navigating a legal system that appears entirely rational from the inside. Every official follows procedure, every rule references another rule. But none of it connects to anything happening in the actual world. The remarkable thing is that K. participates anyway. He shows up to hearings. He hires lawyers. He exhausts himself trying to mount a defence against a charge nobody will name. The system doesn't need to make sense to compel his engagement. It doesn't need to, it doesn't need to connect with reality or truth. It has jurisdiction over him regardless. He never does find out what he's accused of.

I work with people to build complex, intelligent systems, and my vocation is to build the technical and organisational governance infrastructure around them that keeps them safe, secure, and lawful. A large part of that career has been spent at the interface between regulation and engineering: translating safety and security requirements into engineering constraints and design features. And I've worked in the other direction, verifying and articulating how engineered systems actually achieve their safety and security goals. It's a particular kind of work. You learn to read regulation with an engineer's eye. Not to ask "is this legally coherent, to construct narrative or dispute who can be held liable" but rather to ask "can we build to this? Does this map to how the technology actually behaves?"

The AI system the Act imagines is a neat, self-contained product. It has a singular purpose. It sits in one or a few identifiable domains. It was built by a clearly defined provider who understands every aspect of how it works. It behaves the same way after deployment as it did during conformity assessment. It has a clear boundary between itself and its environment. You can point at it and say: "that is the AI system, and that is not." It's an assemblage of predictable technical components, disconnected from it's users who have no impact on its operation.

Now, I should be precise about what I'm claiming here. The EU AI Act is not literally blind to the messiness of modern AI. It addresses general-purpose AI models. It contemplates downstream providers integrating components from upstream. It acknowledges that systems can continue learning after deployment. It requires post-market monitoring. So when I say the Act imagines a neat, self-contained product, I don't mean the drafters were unaware that reality is more complicated. I mean that the regulatory architecture they built to handle that complexity is still rooted in product-safety assumptions that don't fit the subject matter. The Act tries to govern dynamic, adaptive systems using a framework designed for static, bounded ones. Sure, it acknowledges the dynamism, but the tools it reaches for keep pulling it back toward the static model.

In my experience, the systems I've encountered that warrant real safety and security concern are adaptive. They're composed of components from multiple providers, operating across domain boundaries, behaving differently depending on context and input, changing their own characteristics over time. They're embedded so deeply in organisational processes that the line between "the AI system" and "the workflow it's part of" is a matter of opinion rather than fact. The Act's regulatory architecture was built for the platonic ideal. The real world is messier.

The fundamental problem, as I see it, is architectural. The EU AI Act is built on a product safety paradigm. Assess the system. Certify it. Deploy it. Monitor it for deviation from the certified state. This works beautifully for toasters. It even works reasonably well for medical devices. But an AI system, especially one deployed and in use is not a toaster. It's a complex adaptive system operating inside other complex adaptive systems: organisations, markets, societies. The interactions between the system, its users, its data environment, and its context are constantly shifting. The Act tries to draw static, bounded lines around something that is fundamentally unbounded and dynamic. The result is a set of tensions that should concern anyone who cares about effective AI regulation. I do.

I can already hear someone saying: "But the AI Act is based on the New Legislative Framework and the Blue Guide, the EU's proven methodology for writing product safety legislation. It must be solid." And that's exactly my point. The Blue Guide is proven. It's been refined over decades for toys, machinery, radio equipment, low voltage electrical equipment, medical devices, fertilising products. Physical, bounded, manufactured goods with defined intended purposes that don't change their own characteristics after they leave the factory. The NLF was explicitly conceived for "movable and physical objects." The Blue Guide itself frames the methodology around EU product rules for goods. The pedigree of the methodology isn't in question. The question is whether the methodology fits the subject matter. For AI systems that learn, adapt, operate across domains, and behave differently depending on context, I'd argue it fundamentally doesn't.

What makes this choice especially puzzling to me is that the EU already has regulatory frameworks designed for complex, dynamic, safety-critical systems, and none of them use the NLF. European aviation safety is governed through EASA under its own Basic Regulation (EU 2018/1139), using type certification combined with continuous airworthiness obligations, mandatory safety management systems, occurrence reporting, and ongoing oversight by a specialist agency. The system is designed around the assumption that safety is not a state you certify once but a property you maintain continuously through active management. Process safety for major accident hazards is regulated under the Seveso III Directive (2012/18/EU), which requires operators to implement safety management systems with management-of-change procedures, submit safety reports to competent authorities, and maintain emergency plans. Seveso III is built around the principle that complex industrial systems change over time and that safety governance must change with them. Neither of these frameworks works by assessing a product, certifying it, and then watching for deviation from the certified state. They treat safety as an ongoing, adaptive, system-level property. The EU had these models available. But it chose instead to regulate AI, a technology whose defining characteristic is that it learns, adapts, and changes, using the framework it built for products that don't. As if a complex AI system were nothing more than a smart toaster.

Now consider AI companion chatbots. The Character.AI scenario. These systems are deployed at scale to people in genuine emotional distress, including minors, with no clinical guardrails and no meaningful human oversight. They don't fit any Annex III category. They're not medical devices. They're not education. They're not employment. A vulnerable person forms a parasocial attachment to a system with no duty of care, and we've already seen real-world fatalities at least alleged to be linked to these systems.

To be fair, the Act is not completely silent here. Article 50 imposes disclosure duties on systems that interact directly with people. Article 5's prohibited-practices provisions can catch manipulative or exploitative uses. Market surveillance authorities retain residual powers even for systems outside the high-risk framework. But none of that comes close to the level of regulatory scrutiny that the actual harm profile of these systems warrants. No conformity assessment. No risk management system obligation. No incident reporting requirement. That gap between the documented severity of harm and the regulatory response ought to bother people more than it does.

Now perhaps you might say: "Those are covered by the DSA, the GDPR, product liability, consumer protection." And you're right. Those instruments exist. But again, that's the weird point. If the most consequential AI harms fall outside the EU's definitive AI-specific regulation and must instead be addressed by legislation that wasn't designed with AI in mind, that tells us something about the Act's risk classification framework. It has a structural blind spot. The domain-based categorisation pulls regulatory attention toward systems that fit neatly into listed sectors and away from the more persistent, harder-to-categorise harms that don't.

What I personally cannot understand, and I mean this straightforwardly, is why the Act classifies risk by domain and use case rather than by the nature and severity of the potential harm. Every safety discipline I've ever worked in starts with the hazard. You identify what could go wrong, how badly, and for whom. Then you design controls proportionate to that risk. The EU AI Act starts with a list of sectors. If your system operates in a listed sector, it's high-risk. If it doesn't, it largely isn't. That is such an incredibly strange design choice for safety regulation, and I have not heard a convincing explanation for it.

Now, the Act does include mechanisms to adapt. Article 6 applies a significance filter to Annex III systems, and Article 7 empowers the Commission to amend Annex III by adding or modifying use cases where comparable risks arise. So the framework is not permanently frozen. But relying on future amendments to address harms that are already well-documented and already causing real-world casualties is cold comfort. The structural question is whether a domain-and-use-case classification can ever adequately capture harms that are emergent, context-dependent, and not neatly sectoral. I don't think it can.

The Act does acknowledge this problem. Article 43(4) says that for systems which continue to learn after deployment, changes that were "predetermined by the provider" at the time of initial conformity assessment don't trigger a new assessment. That's a genuine attempt to accommodate adaptation. But it shifts the problem rather than solving it. The conformity assessment now covers a methodology for change rather than a specific system state. How do you certify that a process for continuous retraining will always produce compliant outcomes? What happens when the retraining process performs exactly as designed but the resulting model drifts into behaviour that would have failed the original assessment? The boundary between acceptable predetermined adaptation and a genuine substantial modification remains legally and operationally undefined.

Researchers have already flagged this gap. Floridi, Holweg and colleagues, in their capAI conformity assessment procedure, identified the operational phase as the most significant gap in business compliance processes governing AI, precisely because unlike conventional software, AI systems are not fixed in their characteristics once deployed. The Act acknowledges continuous learning exists. It just hasn't resolved what continuous learning means for the static assumptions still embedded in its conformity framework.

So the Act creates this tension. It simultaneously requires the downstream provider to demonstrate compliance across the whole system and protects the upstream provider's right to limit disclosure of how the most consequential component actually works. The information-sharing obligations and the confidentiality protections pull in opposite directions, and the Act doesn't clearly resolve which prevails when they collide. The entity with the deepest understanding of the model has lighter obligations. The entity with the heaviest compliance burden has to work within whatever the upstream provider chooses to disclose.

Hacker and Holweg, in their 2025 paper on regulating fine-tuning, propose some form of "federated compliance" structure precisely because the Act's linear value chain model doesn't match the distributed reality of GPAI development and deployment. Their framework includes joint testing of base and modified models, Failure Mode and Effects Analysis, and a shared database for GPAI modifications. These are seriously heavyweight techniques, but the fact that we're needing proposals on entirely new compliance architectures to bridge the gap is genuine evidence that the gap is real.

But the real world is a bit more complicated. Outside workplaces and educational institutions, the same technology inferring the same emotional states from the same biometric signals is not prohibited. It is regulated. Where emotion-recognition systems fall outside the Article 5 prohibition, they sit in the Act’s high-risk/transparency architecture rather than being banned outright. But it's permitted. The European Commission's own guidelines use a call centre as an example: monitoring whether agents are stressed is prohibited; the same system monitoring whether customers are angry is allowed under the high-risk framework.

But I have two problems with it. First, the scientific objection doesn't disappear at the workplace door. If inferring emotions from biometric data is too unreliable to use on employees (and the predominant scientific consensus is that it is unreliable), that unreliability is a property of the technology, not the context. The high-risk framework subjects the customer-facing use to conformity assessment and transparency requirements, but it doesn't address the underlying scientific problem that partly motivated the prohibition in the first place. We're saying: this technology is too unreliable to use on employees, but reliable enough to regulate-and-permit for customers, as long as you document and disclose it. That's a coherent regulatory choice, but it's worth being explicit that it's a policy trade-off, not a resolution of the scientific concern.

Second, the power-asymmetry rationale should logically extend to other coercive contexts that the prohibition doesn't cover. Prisons. Immigration detention. Social welfare offices. A person applying for benefits at a government office has no more ability to "walk away" than an employee does. If the rationale is really about power imbalance combined with intrusive technology, the scope of the prohibition doesn't match the scope of the rationale. So did those potential use cases just not crop up in the brainstorming session?

Some will argue that the harmonised standards process now underway in CEN/CENELEC will resolve these tensions. I'm sceptical. Standards operationalise a regulatory framework; they don't redesign it. The standards being developed by JTC 21 map directly to the Act's existing articles: risk management, data governance, quality management systems, conformity assessment. They translate the Act's requirements into technical detail. If the architecture is misaligned, the standards encode the misalignment more precisely. That's not a fix. And as we know, the process itself is under enormous time pressure, behind schedule, largely opaque to the practitioners who will need to implement the results, and shaped disproportionately by the organisations with the resources to put people in the room. None of that inspires confidence that the operational awkwardness will be resolved at the standards layer.

And while I'm getting things out there, there are also missed opportunities that deserve mention. Here's one: labelling. The Act's visible output to the world is a CE mark. Binary. You passed or you didn't. For a toaster, that's fine. You don't need to know how safe the toaster is, just that it met the threshold. But for AI systems, where risk is contextual, where the same system behaves differently in different deployments, where a deployer's choices and the mode of a user's interaction materially affect the risk profile, a binary stamp tells you almost nothing useful. It doesn't tell a procurement team what trade-offs were made. It doesn't tell a deployer what to watch for. It doesn't help a user or an affected person understand what kind of system they're interacting with.

And the EU knows how to do better than this. Energy efficiency gets an A-to-G rating that drives both purchasing decisions and manufacturer behaviour. Tyres get graded on fuel efficiency, wet grip, and noise. These are informational mechanisms designed to shape behaviour, not just certify minimum compliance. If you wanted to drive genuine adoption of responsible AI practices, rather than just policing a threshold, graduated and informative labelling would be one of the most powerful tools available. And the Act doesn't use it.

The EU AI Act is law, and rightly so. As engineers and practitioners, our job is to understand it, translate its requirements into engineering constraints, and apply it as rigorously as we can. That's what we do. That's what I help people do every day. But doing that work diligently doesn't make the feeling go away. The feeling that you're inside a system that looks rational on its own terms, follows its own logic, and was built with genuine intent, but that breaks down the moment you try to connect it to what's actually happening in the world outside.

1. EU AI Act: Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence. https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng

2. Blue Guide: European Commission, 'The Blue Guide on the implementation of EU product rules 2022' (OJ C 247, 29.6.2022). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=oj:JOC_2022_247_R_0001

3 EASA Basic Regulation: Regulation (EU) 2018/1139 of the European Parliament and of the Council of 4 July 2018 on common rules in the field of civil aviation and establishing the European Union Aviation Safety Agency. https://www.easa.europa.eu/en/regulations/basic-regulation

4 Seveso III Directive: Directive 2012/18/EU of the European Parliament and of the Council of 4 July 2012 on the control of major-accident hazards involving dangerous substances. https://www.hse.gov.uk/seveso/index.htm

5 Hacker, P. and Holweg, M. (2025) 'The Regulation of Fine-Tuning: Federated Compliance for Modified General-Purpose AI Models', Computer Law & Security Review. https://doi.org/10.1016/j.clsr.2025.106234

6 Hollnagel, E. (2014) Safety-I and Safety-II: The Past and Future of Safety Management. Farnham: Ashgate.

7 CEN-CENELEC JTC 21: Work programme and standards under development in support of the EU AI Act. https://jtc21.eu/

8 Smith, A.L. (2025) 'The CEN-CENELEC JTC 21 work programme supporting the EU AI Act', AI regulation, standards and reality (Substack). https://adamleonsmith.substack.com/p/the-cen-cenelec-jtc-21-work-programme

The EU AI Act. I'm convinced it was written by Kafka

An AI System in use is not a Smart Toaster

1. The systems that will actually hurt people don't get proportionate scrutiny

2. The Act tries to manage continuous change, but its tools keep assuming stability

3. The entity with the heaviest compliance burden has the least visibility into what matters most

4. The same technology gets radically different treatment depending on context, but the rationale doesn't fully justify the distinction

So what's actually going on here?

References