Adversarial AI Attacks Highlight Fundamental Security Issues

Artificial intelligence and machine learning (AI/ML) systems trained using real-world data are increasingly being seen as open to certain attacks that fool the systems by using unexpected inputs.

At the recent Machine Learning Security Evasion Competition (MLSEC 2022), contestants successfully modified celebrity photos with the goal of having them recognized as a different person, while minimizing obvious changes to the original images. The most common approaches included merging two images — similar to a deepfake — and inserting a smaller image inside the frame of the original.

In another example, researchers from the Massachusetts Institute of Technology (MIT), University of California at Berkeley, and FAR AI found that a professional-level Go AI — that is, for the ancient board game — could be trivially beaten with moves that convinced the machine that the game had completed. While the Go AI could defeat a professional or amateur Go player because they used a logical set of movies, an adversarial attack could easily beat the machine by making decisions that no rational player would normally make.

These attacks highlight that while AI technology may work at superhuman levels and even be extensively tested in real-life scenarios, it continues to be vulnerable to unexpected inputs, says Adam Gleave, a doctoral candidate in artificial intelligence at the University of California at Berkeley, and one of the primary authors of the Go AI paper.

“I would default to assuming that any given machine learning system is insecure,” he says. “[W]e should always avoid relying on machine learning systems, or any other individual piece of code, more than is strictly necessary [and] have the AI system recommend decisions but have a human approve them prior to execution.”

All of this underscores a fundamental problem: Systems that are trained to be effective against “real-world” situations — by being trained on real-world data and scenarios — may behave erratically and insecurely when presented with anomalous, or malicious, inputs.

The problem crosses applications and systems. A self-driving car, for example, could handle nearly every situation that a normal driver might encounter while on the road, but act catastrophically during an anomalous event or one caused by an attacker, says Gary McGraw, a cybersecurity expert and co-founder of the Berryville Institute of Machine Learning (BIML).

“The real challenge of machine learning is figuring out how to be very flexible and do things as they are supposed to be done usually, but then to react correctly when an anomalous event occurs,” he says, adding: “You typically generalize to what experts do, because you want to build an expert … so it’s what clueless people do, using surprise moves … that can cause something interesting to happen.”

Fooling AI (And Users) Isn’t Hard

Because few developers of machine learning models and AI systems focus on adversarial attacks and using red teams to test their designs, finding ways to cause AI/ML systems to fail is fairly easy. MITRE, Microsoft, and other organizations have urged companies to take the threat of adversarial AI attacks more seriously, describing current attacks through the Adversarial Threat Landscape for Artificial-Intelligence Systems (ATLAS) knowledge base and noting that research into AI — often without any sort of robustness or security designed in — has skyrocketed.

Part of the problem is that non-experts who do not understand the mathematics behind machine learning often believe that the systems understand context and the world in which it operates. 

Large models for machine learning, such as the graphics-generating DALL-e and the prose-generating GPT-3, have massive data sets and emergent models that appear to result in a machine that reasons, says David Hoelzer, a SANS Fellow at the SANS Technical Institute. 

Yet, for such models, their “world” includes only the data on which they were trained, and so they otherwise lack context. Creating AI systems that act correctly in the face of anomalies or malicious attacks requires threat modeling that takes into account a variety of issues.

“In my experience, most who are building AI/ML solutions are not thinking about how to secure the … solutions in any real ways,” Hoelzer says. “Certainly, chatbot developers have learned that you need to be very careful with the data you provide during training and what kinds of inputs can be permitted from humans that might influence the training so that you can avoid a bot that turns offensive.”

At a high level, there are three approaches to an attack on AI-powered systems, such as those for image recognition, says Eugene Neelou, technical director for AI safety at Adversa.ai, a firm focused on adversarial attacks on machine learning and AI systems.

Those are: embedding a smaller image inside the main image; mixing two sets of inputs — such as images — to create a morphed version; or adding specific noise that causes the AI system to fail in a specific way. This last method is typically the least obvious to a human, while still being effective against AI systems.

In a black-box competition to fool AI systems run by Adversa.ai, all but one contestant used the first two types of attacks, the firm stated in a summary of the contest results. The lesson is that AI algorithms do not make systems harder to attack, but easier because they increase the attack surface of regular applications, Neelou says.

“Traditional cybersecurity cannot protect from AI vulnerabilities — the security of AI models is a distinct domain that should be implemented in organizations where AI/ML is responsible for mission-critical or business-critical decisions,” he says. “And it’s not only facial recognition — anti-fraud, spam filters, content moderation, autonomous driving, and even healthcare AI applications can be bypassed in a similar way.”

Test AI Models for Robustness

Like other types of brute-force attacks, rate limiting the number of attempted inputs can also help the creators of AI systems prevent ML attacks. In attacking the Go system, UC Berkeley’s Gleave and the other researchers built their own adversarial system, which repeatedly played games against the targeted system, raising the victim AI’s difficulty level as the adversary became increasingly successful.

The attack technique underscores a potential countermeasure, he says.

“We assume the attacker can train against a fixed ‘victim’ agent for millions of time steps,” Gleave says. “This is a reasonable assumption if the ‘victim’ is software you can run on your local machine, but not if it’s behind an API, in which case you might get detected as being abusive and kicked off the platform, or the victim might learn to stop being vulnerable over time — which introduces a new set of security risks around data poisoning but would help defend against our attack.”

Companies should continue following security best practices, including the principle of least privilege — don’t give workers more access to sensitive systems than they need or rely on the output of those systems more than necessary. Finally, design the entire ML pipeline and AI system for robustness, he says.

“I’d trust a machine learning system more if it had been extensively adversarially tested, ideally by an independent red team, and if the designers had used training techniques known to be more robust,” Gleave says.