Robert Važan

EU AI Act analysis

EU Artificial Intelligence Act has just been approved (press release, full text, votes, Wikipedia). This was the last vote, in which things could have been changed. Now it's only a matter of technical checks (linguistic and legal) and a few years of ramping up its effects. It can potentially kill my business, so I did a more thorough reading of the law instead of relying on the limited and somewhat misleading press release and news articles. The law is 450 pages long, so you will perhaps appreciate my summary.

TLDR

EU AI Act is a great victory for copyright holders and a huge loss for opensource. It introduces mandatory censorship and mandatory watermarks. It exposes AI developers and users to legal risks and legal uncertainty. Safety of general-purpose models focuses on capabilities instead of intent, repeating the mistake of the infamous cookie law. Exemptions covering personal use, opensource, research, and development have unexpected limitations. There are lobbyists' fingerprints all over the law. Overall, it's bad news for artificial intelligence.

Great victory for copyright holders

EU AI Act reaffirms opt-out from data mining that was granted to copyright holders in Article 4 of EU Copyright Directive (Wikipedia, full text). There are two problems with this opt-out. Firstly, the opt-out is an unreasonable extension of copyright, because data mining (including AI training) extracts information from underlying representation and pure information traditionally isn't copyrightable. Secondly, the opt-out is generally not used by individual content creators but rather by social networks, which use it as another mechanism to appropriate user-contributed content and with it our generation's cultural heritage.

EU AI Act further aggravates the problem by requiring developers of gereral-purpose models to publish copyright protection policy and to document source datasets, a measure specifically intended to enable copyright audits. Model developers were keeping datasets secret for many reasons, one of which was to protect themselves from predatory lawsuits.

There are no workarounds. You are subject to the regulation even if you train in the US and then deploy in Europe or if you host the model in the US and expose it as a remote service to users in Europe. Opensource models are not exempt either. EU AI Act plugs all holes. You can still train and deploy outside Europe as much as you want, but reaching European users requires compliance.

Copyright holders, especially social media, will now start demanding fees to access content created by their users. While widely shared information can be always obtained from elsewhere, information circulated in small communities on social networks will be inaccessible for training. High model quality also requires diversity of information sources, which will suffer when large parts of the Internet become inaccessible for training. Model developers will be under great pressure to just pay and social networks will intensify efforts to capture as much content as possible.

Huge loss for opensource

Opensource has some exemptions from requirements of the EU AI Act, but the main clause enabling this is a mess (Article 2, clause 12). It seems to have been accidentally negated, now claiming that opensource models are subject to the regulation. It seems to have broken references to other sections of the regulation, making it unclear what rules actually apply to opensource models. Furthermore, mentions of opensource elsewhere in the regulation seem to contradict it. Lawyers will eventually hash out actual requirements placed on opensource models, but meantime opensource suffers from legal uncertainty and looming legal risks.

Definition of opensource in the EU AI Act is limited to free (as in free beer) and open AIs. If you offer support or other related services, your AI model will not be considered opensource even if you publish weights under permissive license and other people are using your model for free. One paying customer is enough to lose all privileges of opensource models.

Opensource developers already suffer legal difficulties when sharing training datasets, which amounts to redistribution of copyrighted material when done openly among unaffiliated developers. EU AI Act does not address this difficulty and instead reaffirms data mining opt-out, which will make large parts of the Internet inaccessible to non-commercial opensource developers, and burdens opensource AI developers with reporting requirements that serve only interests of copyright holders.

Opensource development requires open sharing of code and models among contributors. As I understand it, this constitutes placing the model on the market if the model is available to developers in the EU. Certain kinds of opensource collaboration, for example taking turns in training the same model or parallel distributed training, are thus subject to the regulation. Courts might still dismiss it as technicality that is not in the spirit of the law. This is nevertheless a source of legal uncertainty for opensource developers. Private enterprises can meanwhile develop their AIs without restriction even if that means transferring artifacts to employees in the EU.

Opensource is not exempt from the heavy regulation that applies to high-risk AI applications, which includes dozens of pages of rules in the law itself plus new standards and certification procedures. As no opensource developer will ever comply with all the rules, this effectively outlaws opensource in areas defined as high-risk. While regulating truly high-risk applications like self-driving cars is generally reasonable, I worry about gray areas, for example self-service medical models, which might fall under the regulation depending on precise wording of the relevant laws. There is some leeway for opensource components in high-risk applications, but whole systems apparently have to be commercial.

The regulation spells doom for traditional opensource software as well. As AI components are increasingly integrated in all software to support essential features, most software will eventually fall in scope of the EU AI Act. The regulation is not here to kill opensource AI. It's here to kill opensource as such.

Mandatory censorship of large models

General-purpose models may be classified as having "systemic risk", i.e. disaster-level risk. While general-purpose robots are obviously dangerous, because they can wield knives and guns, I am having hard time imagining how disembodied AI can possibly become a serious threat.

Wording of the regulation is somewhat vague, but for now the "systemic risk" label is assigned only to large models, 100B+ parameters in case of LLMs. Worrying about models being too smart is already a warning sign. Most risks associated with deployment of AI stem from AI's imperfect reasoning. When LLM recommends wrong treatment for a disease, it does not do so out of malice but rather because of limited knowledge or intelligence. Larger models make fewer mistakes and they are therefore actually safer to use.

What's more worrying is that "providers of general-purpose AI models with systemic risk shall ... mitigate possible systemic risk". In practice, that translates to mandatory censorship of all large models. The same requirements apply to fine-tunes of large models, so censorship cannot be legally removed. It is not clear whether base models will have to be censored too, for example by filtering training datasets, or whether they are exempted due to their nature.

Systemic risks include such trivialities as "dissemination of illegal, false, or discriminatory content" (nevermind that media, social networks, and religions already do so at scale). However, what content is legal, true, and fair depends on context. And since models usually aren't aware of the full context, they cannot make content policy decisions reliably. For all the model knows, it could be just role-playing some character in a fictional story. Censorship is inappropriate for some models and it is universally despised by users, because it is overbearing, disrespectful, triggers randomly on innocent prompts, and biases model output in unrealistic ways. When the AI is used for brainstorming, model censorship carves a digital exception to freedom of thought.

At the same time, EU AI Act prohibits manipulative models. The gotcha here is that censorship as commonly implemented is manipulative by nature, because in addition to outright refusals, it secretly reshapes model's interpretation of user's prompt and subliminally alters all output. This squeezes AI developers into tight compliance space between mandatory censorship and prohibited manipulation.

Mandatory watermarking

Mandatory watermarking (Article 50, clause 2) is a major violation of privacy, because it unconspiciously reveals what tools were used to create given piece of content. Nothing prohibits AI companies from abusing watermarking to reveal additional information, for example user's identity, timestamp, or even the entire prompt.

Watermaring will result in false accusations of fraud, because watermarks are present even if the AI was used only in supporting role to polish or translate human work. Watermarks are in some cases unreliable. For example, there's non-trivial probability that article-sized text is falsely identified as LLM-generated content.

Since when is incompetence a good thing?

When it comes to regulation of general-purpose models, EU AI Act repeats the mistake of the infamous cookie law (ePrivacy Directive) in targeting capability instead of malice and negligence.

There are three kinds of threats that can be seen in AIs, humans, and even in traditional software: malice, negligence, and incompetence. In AIs, malice translates to deliberately harmful applications. Incompetence usually stems from insufficient size or training of the model. Negligence comes in two forms: incorrect objectives and lax containment. Incorrect training or application objectives cause the model to do something else than what was intended. Containment refers to model's access to the Internet, APIs, and the physical world. Lax containment magnifies impact of AI's mistakes akin to how testing brakes or guns in public streets magnifies danger inherent in these tasks.

By focusing regulation on the largest and most capable models, EU AI Act rewards incompetence (poor model performance) and thus makes AIs less safe. It's like criminalizing people who are too smart or like regulating software that is too useful. The law should have instead focused on malice (application intent) and negligence (quality of objectives and containment).

Enabling trolls

There are no exceptions from GDPR (Wikipedia, full text). Imagine you train your model on web crawl data that accidentally includes some personal data, which gets baked into the model, and someone requests removal of their personal data per Article 17 of GDPR. How do you comply? There are no tools that can edit knowledge out of an already trained model. Is removal from next year's version of the model considered "without undue delay"? Are users forced to upgrade? What if you don't plan to publish another version?

There are no blanket exceptions for minor, random, and accidental breaches of this or other laws. Training datasets are huge. There's no way to ensure they comply with existing laws perfectly. Models sometimes run unattended and create content or take actions on behalf of the user without manual review. Outputs of AI models are however unreliable and they even include a component of randomness. There's no way to ensure that an unattended model never produces illegal output. While some laws explicitly exempt accidental violations, I think this is not universal and there are hard-edged laws out there that will be a permanent threat to developers and users of AI models.

Exemptions

If you are looking for a way to avoid compliance with EU AI Act, there are a few narrow exemptions to consider:

Impact on SourceAFIS

I am developing opensource fingerprint recognition engine SourceAFIS and providing custom development on top of the opensource version. EU AI Act can potentially ruin my business, which was the original reason I reviewed the law.

Fortunately, it looks like I am okay. Only remote biometric identification is regulated. Remote means without person's active involvement (think face recognition using a camera). Most applications of fingerprint recognition are however local, because the person must be present and cooperating to have their fingerprints scanned. Fingerprint capture and recognition can be remote only in case of latent prints (taken off surfaces the person touched) and in the exotic case of using high-resolution camera at a distance, both of which are uncommon in commercial applications of fingerprint recognition systems.

Biometric verification (1:1) of claimed identity is even explicitly excluded from scope of the law. It's not clear whether this also covers claims of group identity (e.g. claiming to be an employee at the entrance to company premises), which technically require identification (1:N) to implement, but in any case, whether it's verification or identification, non-remote applications are not regulated under EU AI Act.

Goofs

Some aspects of the EU AI Act are so ridiculously foolish or absurd they make you laugh:

Future

EU AI Act still needs to go through linguistic and legal checks, corrigendum procedure, and get final approval from the European Council. It will then gradually enter into force over the course of two years.

Drafting of the EU AI Act took years and it already required substantial revision after ChatGPT was released. More such revisions are likely to be needed in the near future and they will likely be substantial enough to be beyond the scope of the faster, simpler delegated acts.

The 1025 FLOPS threshold will be quickly overcome given current investments in LLMs. The biggest Llama3 is rumored to be big enough to be the first opensource model over the threshold. EU AI Act allows for later updates of the threshold, but these have to be justified and the justifications allowed by the law suggest downward revision of the threshold rather than upward one.

Future revision of the law might theoretically completely remove the threshold and switch to monitoring malicious and negligent applications. I wouldn't be too hopeful though, because cookie law is broken in the same way and it was never fixed.

EU AI Act will enable future regulation creep, which will be as easy as adding items into the list of high-risk AIs, adding reasons to classify more general-purpose AIs as having systemic risk, or expanding standards and certification procedures.

Regulation will render a lot of opensource models illegal in the EU. This will not be accepted by users who will continue to use the models illegally and share them over P2P networks. That will result in mass criminalization but with no actual penalties for small-time users.