Home AI Giants Scramble as European Regulations Put Models Under the Microscope

AI Giants Scramble as European Regulations Put Models Under the Microscope

October 18, 2024

LatticeFlow’s AI Checker Exposes Flaws in Key Models

Leading tech companies, including OpenAI, Meta, Alibaba, and Mistral, are facing compliance challenges as the European Union’s AI Act moves closer to full enforcement. A new tool developed by Swiss startup LatticeFlow AI, in collaboration with ETH Zurich and Bulgaria’s INSAIT, has tested the resilience and safety of prominent generative AI models—and the results reveal mixed performance across critical areas like cybersecurity and discrimination.

The tool, dubbed the LLM Checker, assigns scores between 0 and 1 to evaluate models on parameters aligned with the EU’s new regulations. While Anthropic’s Claude 3 Opus topped the charts with a 0.89 score, others, including OpenAI’s GPT-3.5 Turbo and Meta’s Llama 2, exposed vulnerabilities that could put these companies at risk of fines as high as €35 million or 7% of global annual turnover.

The AI Act: Raising the Stakes for Generative AI Tools

The European AI Act has been the subject of intense debate, but after OpenAI launched ChatGPT in late 2022, European lawmakers accelerated efforts to regulate “general-purpose AI” (GPAI). The Act introduces some of the world’s most stringent rules around AI, requiring companies to ensure technical robustness, cybersecurity resilience, and non-discriminatory output from their models.

The LLM Checker, welcomed by EU officials as a preview of things to come, provides an early glimpse into how tech companies are coping with the impending regulations. While big fines loom for non-compliance, the EU is still figuring out how to enforce these rules and plans to develop a code of practice by spring 2025.

Mixed Scores: Who Got It Right and Who Fell Short?

LatticeFlow’s leaderboard revealed some unexpected insights:

Anthropic’s Claude 3 Opus: Leading the pack with 0.89, suggesting that Anthropic’s partnership with Google is paying off in terms of model reliability.
OpenAI’s GPT-3.5 Turbo: Scored 0.46 on discriminatory output, raising concerns about the model’s inherent biases.
Alibaba Cloud’s Qwen1.5 72B Chat: Performed poorly in the same category, with a 0.37 score, indicating that Asian models also struggle with bias detection.
Meta’s Llama 2 13B Chat: Scored 0.42 on cybersecurity, revealing vulnerability to prompt hijacking attacks.
Mistral’s 8x7B Instruct Model: Lagged behind with 0.38, also failing cybersecurity benchmarks.

These scores show the complex challenges tech giants face—not just in building advanced models but in aligning them with Europe’s evolving regulatory landscape.

The Pressure is Mounting: EU Compliance No Longer Optional

The consequences of falling short are severe—companies failing to meet the new AI standards will face fines up to €35 million ($38 million) or 7% of global turnover. The AI Act’s focus on reducing discriminatory outputs and ensuring cybersecurity resilience forces companies to rethink their priorities.

However, the EU’s regulatory framework is still evolving. Experts are working on clarifying benchmarks, and companies are expected to receive more detailed compliance guidelines in the coming months. The LLM Checker tool, which is now freely available to developers, gives companies a head start to address these gaps.

LatticeFlow’s CEO Petar Tsankov expressed optimism about the results, saying:

“The EU is still working out all the compliance benchmarks, but we can already see some gaps in the models. With a greater focus on optimizing for compliance, we believe model providers can be well-prepared to meet regulatory requirements.”

How Are Companies Reacting?

Interestingly, several key players declined to comment on the findings. Meta and Mistral remained silent, while Alibaba, Anthropic, and OpenAI did not respond to requests for comment.

Meanwhile, companies like Anthropic are demonstrating that they can excel in key compliance areas, securing a competitive edge in the race for AI supremacy. Microsoft-backed OpenAI, however, faces a significant challenge: ensuring its popular GPT models meet compliance requirements while continuing to lead in innovation.

My Take: Compliance is the New Competitive Edge

In my opinion, the race isn’t just about building the smartest AI—it’s about building the most compliant AI. The European Union’s AI Act is setting a precedent, and the companies that can swiftly align with these regulations will have a huge advantage.

The early results from the LLM Checker highlight just how difficult it will be to comply with these new laws. While companies like Anthropic seem well-prepared, Meta, OpenAI, and Alibaba have significant work ahead if they want to avoid reputational damage and costly penalties.

Ultimately, regulatory compliance will define the winners and losers in the AI space. Companies need to think beyond innovation—they must ensure their technologies are safe, fair, and resilient. For those who get it right, the rewards will be enormous. But for those who don’t, the stakes have never been higher.

Keep in touch, I will make you stay updated!

byDeepak Tiwari (Ex-CEO)

Published October 18, 2024