Machine learning adoption exploded over the past decade, driven in part by the rise of cloud computing, which has made high performance computing and storage more accessible to all businesses. As vendors integrate machine learning into products across industries, and users rely on the output of its algorithms in their decision making, security experts warn of adversarial attacks designed to abuse the technology.
Most social networking platforms, online video platforms, large shopping sites, search engines and other services have some sort of recommendation system based on machine learning. The movies and shows that people like on Netflix, the content that people like or share on Facebook, the hashtags and likes on Twitter, the products consumers buy or view on Amazon, the queries users type in Google Search are all fed back into these sites’ machine learning models to make better and more accurate recommendations.
It’s not news that attackers try to influence and skew these recommendation systems by using fake accounts to upvote, downvote, share or promote certain products or content. Users can buy services to perform such manipulation on the underground market as well as “troll farms” used in disinformation campaigns to spread fake news.
“In theory, if an adversary has knowledge about how a specific user has interacted with a system, an attack can be crafted to target that user with a recommendation such as a YouTube video, malicious app, or imposter account to follow,” Andrew Patel, a researcher with the Artificial Intelligence Center of Excellence at security vendor F-Secure explained in a blog post. “As such, algorithmic manipulation can be used for a variety of purposes including disinformation, phishing scams, altering of public opinion, promotion of unwanted content, and discrediting individuals or brands. You can even pay someone to manipulate Google’s search autocomplete functionality.”
What is data poisoning?
Data poisoning or model poisoning attacks involve polluting a machine learning model’s training data. Data poisoning is considered an integrity attack because tampering with the training data impacts the model’s ability to output correct predictions. Other types of attacks can be similarly classified based on their impact:
Confidentiality, where the attackers can infer potentially confidential information about the training data by feeding inputs to the model
Availability, where the attackers disguise their inputs to trick the model in order to evade correct classification
Replication, where attackers can reverse-engineer the model in order to replicate it and analyze it locally to prepare attacks or exploit it for their own financial gain
The difference between an attack that is meant to evade a model’s prediction or classification and a poisoning attack is persistence: with poisoning, the attacker’s goal is to get their inputs to be accepted as training data. The length of the attack also differs because it depends on the model’s training cycle; it might take weeks for the attacker to achieve their poisoning goal.
Data poisoning can be achieved either in a blackbox scenario against classifiers that rely on user feedback to update their learning or in a whitebox scenario where the attacker gains access to the model and its private training data, possibly somewhere in the supply chain if the training data is collected from multiple sources.
Data poisoning examples
In a cybersecurity context, the target could be a system that uses machine learning to detect network anomalies that could indicate suspicious activity. If an attacker understands that such a model is in place, they can attempt to slowly introduce data points that decrease the accuracy of that model, so that eventually the things that they want to do won’t be flagged as anomalous anymore, Patel tells CSO. This is also known as model skewing.
A real-world example of this is attacks against the spam filters used by email providers. In a 2018 blog post on machine learning attacks, Elie Bursztein, who leads the anti-abuse research team at Google said: “In practice, we regularly see some of the most advanced spammer groups trying to throw the Gmail filter off-track by reporting massive amounts of spam emails as not spam […] Between the end of Nov 2017 and early 2018, there were at least four malicious large-scale attempts to skew our classifier.”
Another example involves Google’s VirusTotal scanning service, which many antivirus vendors use to augment their own data. While attackers have been known to test their malware against VirusTotal before deploying it in the wild, thereby evading detection, they can also use it to engage in more persistent poisoning. In fact, in 2015 there were reports that intentional sample poisoning attacks through VirusTotal were performed to cause antivirus vendors to detect benign files as malicious.
No easy fix
The main problem with data poisoning is that it’s not easy to fix. Models are retrained with newly collected data at certain intervals, depending on their intended use and their owner’s preference. Since poisoning usually happens over time, and over some number of training cycles, it can be hard to tell when prediction accuracy starts to shift.
Reverting the poisoning effects would require a time-consuming historical analysis of inputs for the affected class to identify all the bad data samples and remove them. Then a version of the model from before the attack started would need to be retrained. When dealing with large quantities of data and a large number of attacks, however, retraining in such a way is simply not feasible and the models never get fixed, according to F-Secure’s Patel.
“There’s this whole notion in academia right now that I think is really cool and not yet practical, but we’ll get there, that’s called machine unlearning,” Hyrum Anderson, principal architect for Trustworthy Machine Learning at Microsoft, tells CSO. “For GPT-3 [a language prediction model developed by OpenAI], the cost was $16 million or something to train the model once. If it were poisoned and identified after the fact, it could be really expensive to find the poisoned data and retrain. But if I could unlearn, if I could just say ‘Hey, for these data, undo their effects and my weights,’ that could be a significantly cheaper way to build a defense. I think practical solutions for machine unlearning are still years away, though. So yes, the solution at this point is to retrain with good data and that can be super hard to accomplish or expensive.”
Prevent and detect
Given the difficulties in fixing poisoned models, model developers need to focus on measures that could either block attack attempts or detect malicious inputs before the next training cycle happens—things like input validity checking, rate limiting, regression testing, manual moderation and using various statistical techniques to detect anomalies.
For example, a small group of accounts, IP addresses, or users shouldn’t account for a large portion of the model training data. Restrictions can be placed on how many inputs provided by a unique user are accepted into the training data or with what weight. Newly trained classifiers can be compared to previous ones to compare their outputs by using dark launches—rolling them out to only a small subset of users. In his blog post, Google’s Bursztein also recommended building a golden dataset that any retrained model must accurately predict, which can help detect regressions.
According to Anderson, data poisoning is just a special case of a larger issue called data drift that happens in systems. Everyone gets bad data for a variety of reasons, and there is a lot of research out there on how to deal with data drift as well as tools to detect significant changes in operational data and model performance, including by large cloud computing providers. Azure Monitor and Amazon SageMaker are examples of services that include such capabilities.
“If your model’s performance after a retraining takes a dramatic hit, whether or not it’s a poisoning attack or just a bad batch of data is probably immaterial and your system can detect that,” Anderson says. “If you manage to fix that, then you can either root out that targeted poisoning attack or the bad batch of data that inadvertently got inside your data aperture when you trained your model. So those kinds of tools are a good start and they’re kind of in this AI risk management framework that’s beginning to materialize in the industry.”
To perform data poisoning, attackers also need to gain information about how the model works, so it’s important to leak as little information as possible and have strong access controls in place for both the model and the training data. In this respect, machine learning defenses are tied to general security practices and hygiene—things like restricting permissions, enabling logging, and using file and data versioning.
“A lot of security in AI and machine learning has to do with very basic read/write permissions for data or access to models or systems or servers,” Anderson says. “It’s a case where a small over permissive data provider service or file in some directory could lead to a poisoning attack.”
Go on the attack
Just as organizations run regular penetration tests against their networks and systems to discover weaknesses, they should expand this to the machine learning context, as well as treating machine learning as part of the security of the larger system or application.
“I think the obvious thing that developers should do with building a model is to actually attack it themselves to understand how it can be attacked and by understanding how it can be attacked, they can then attempt to build defenses against those attacks,” Patel says. “Your detection is going to be based on what you found from the red teaming so when you put together attacks against the model, you can then understand what the data points would look like, and then accordingly, you would build mechanisms that are able to discard the data points that look like poisoning.”
Anderson is actively involved with this at Microsoft. In a recent talk at the USENIX Enigma conference, he presented a red team exercise at Microsoft where his team managed to reverse-engineer a machine learning model that was being used by a resource provisioning service to ensure efficient allocation and mapping of virtual resources to physical hardware.
Without having direct access to the model, the team managed to find enough information about how it collected data to create of a local model replica and test evasion attacks against it without being detected by the live system. This allowed them to identify what combinations of virtual machines, databases, their sizes and replication factors, at what times of day and in what regions they should request from the real system to ensure with a high probability that the machine learning model would overprovision the resources they requested on physical hosts that also hosted high-availability services.
With those over-provisioned resources, they launched a noisy neighbor attack with payloads that had high CPU and RAM usage to cause a denial of service attack against the high-availability services also hosted on the same hardware. “The attack had striking similarities to adversary activity on any IT system,” Anderson concluded in the talk. “There was exfiltration, evasion, and execution, finally ending up with the impact of service availability.”