Data Poisoning: When Attackers Turn AI and ML Against You

Stopping ransomware has become a priority for many organizations. So, they are turning to artificial intelligence (AI) and machine learning (ML) as their defenses of choice. However, threat actors are also turning to AI and ML to launch their attacks. One specific type of attack, data poisoning, takes advantage of this.

Why AI and ML Are at Risk

Like any other tech, AI is a two-sided coin. AI models excel at processing lots of data and coming up with a “best guess,” says Garret Grajek, CEO of YouAttest, in an email interview.

“Hackers have used AI to attack authentication and identity validation, including voice and visualization hacking attempts,” he says. “The ‘weaponized AI’ works to derive the key for access.”

“Adversarial data poisoning is an effective attack against machine learning and threatens model integrity by introducing poisoned data into the training dataset,” researchers from Cornell University explain.

What makes attacks through AI and ML different from typical ‘bug in the system’ attacks? There are inherent limits and weaknesses in the algorithms that can’t be fixed, says Marcus Comiter in a paper for Harvard University’s Belfer Center for Science and International Affairs.

“AI attacks fundamentally expand the set of entities that can be used to execute cyberattacks,” Comiter adds. “For the first time, physical objects can be now used for cyberattacks. Data can also be weaponized in new ways using these attacks, requiring changes in the way data is collected, stored, and used.”

Human Error

To better understand how threat actors use AI and ML as an attack vector for data poisoning and other attacks, we need to have a clearer picture of the role they play in protecting data and networks.

Ask a chief information security officer what the greatest threat to an organization’s data is, and more often than not they’ll tell you it’s human nature.

Employees don’t plan to be a cyber risk, but they are human. People are distractible. They miss a threat today they would have easily avoided yesterday. An employee rushing to make a deadline and expecting an important document may end up clicking on an infected attachment, mistaking it for the one they need. Or, employees simply may not be aware, as their security awareness training is too inconsistent to have made an impression. Threat actors know this, and like any good criminal, they are looking to find the easiest way into a network and to the data. Phishing attacks are so common because they work so well.

Using Outlier Behavior as a Risk Factor

This is where AI and ML malware detection comes to the rescue. These technologies find patterns and analyze user behavior, sniffing out strange behavior before it turns into a problem. By applying the generated algorithms, ML recognizes outlier behavior that a human can’t possibly. It can, for example, detect the normal work day of an employee or the rhythm of their keystrokes and set up alerts for something out of the ordinary.

It’s not perfect, of course. Someone could be working outside of their normal hours or have an injury that impacts the way they type. But these tools are designed to catch something out of the ordinary, such as a threat actor using stolen credentials.

At best, we can use AI to better protect networks from ransomware attacks by telling the difference between real and malicious files on unsupervised computers and networks, blocking access to the bad files. AI could sniff out shadow IT, telling authorized connections from threatening ones and giving insight into the number of endpoints the workforce uses.

For AI and ML to be successful in fighting cyber threats, they rely on data and the algorithms created over a specified period of time. That’s what allows them to find the problems efficiently (and frees up the security team for other tasks). And it is also the threat. The rise in AI and ML is leading directly to the sleeper threat of data poisoning.

Understanding Data Poisoning

There are two ways to poison data. One is to inject information into the system so it returns incorrect classifications.

At the surface level, it doesn’t look that difficult to poison the algorithm. After all, AI and ML only know what people teach them. Imagine you’re training an algorithm to identify a horse. You might show it hundreds of pictures of brown horses. At the same time, you teach it to recognize cows by feeding it hundreds of pictures of black-and-white cows. But when a brown cow slips into the data set, the machine will tag it as a horse. To the algorithm, a brown animal is a horse. A human would be able to recognize the difference, but the machine won’t unless the algorithm specifies that cows can also be brown.

If threat actors access the training data, they can then manipulate that information to teach AI and ML anything they want. They can make them see good software code as malicious code, and vice versa. Attackers can reconstruct human behavior data to launch social engineering attacks or to determine who to target with ransomware.

The second way threat actors could take advantage of the training data to generate a back door.

“Hackers may use AI to help choose which is the most likely vulnerability worth exploiting. Thus, malware can be placed in enterprises where the malware itself decides upon the time of attack and which the best attack vector could be. These attacks, which are, by design, variable, make it harder and longer to detect.” says Grajek.

How Attackers Use Data Poisoning

An important thing to note with data poisoning is that the threat actor needs to have access to the data training program. So you may be dealing with an insider attack, a business rival or a nation-state attack.

The Department of Defense, for example, is looking at how to best defend its networks and data from a data poisoning attack.

“Current research on adversarial AI focuses on approaches where imperceptible perturbations to ML inputs could deceive an ML classifier, altering its response,” Dr. Bruce Draper wrote about a DARPA research project, Guaranteeing AI Robustness Against Deception. “Although the field of adversarial AI is relatively young, dozens of attacks and defenses have already been proposed, and at present a comprehensive theoretical understanding of ML vulnerabilities is lacking.”

Attackers can also use data poisoning to make malware smarter. Threat actors use it to compromise email by cloning phrases to fool the algorithm. It has now even moved into biometrics, where attackers can lock out legitimate users and sneak in themselves.

Data Poisoning and Deepfakes

Deepfakes are a level of data poisoning that many expect to be the next big wave of digital crime. Attackers edit videos, pictures and voice recordings to make realistic-looking images. Because they can be mistaken for real photographs or videos by many eyes, they’re a ripe technique for blackmail or embarrassment. Wielded at corporate level, a variant of this can also lead to physical dangers, as Comiter pointed out.

“[A]n AI attack can transform a stop sign into a green light in the eyes of a self-driving car by simply placing a few pieces of tape on the stop sign itself,” he wrote.

Fake news also falls under data poisoning. Algorithms in social media are corrupted to allow for incorrect information to rise to the top of a person’s news feed, replacing authentic news sources.

Stopping Data Poisoning Attacks

Data poisoning is still in its infancy, so cyber defense experts are still learning how to best defend against this threat. Pentesting and offensive security testing may lead to finding vulnerabilities that give outsiders access to data training models. Some researchers are also considering a second layer of AI and ML designed to catch potential errors in data training. And of course, ironically, we need a human to test the AI algorithms and check that a cow is a cow and not a horse.

“AI is just one more weapon in the attacker’s arsenal,” says Grajek. “The hackers will still want to move across the enterprise, escalate their privileges to perform their task. Constant and real-time privilege escalation monitoring is crucial to help mitigate attacks, caused by AI or not.”

Sue Poremba

I began writing within the branded content/content marketing space in 2011, including articles, blog posts, SEO, Q&A, and profiles. My specialties are cy…
read more