Even at this early stage of the game, machine learning holds much promise, and is being applied to incredibly diverse fields – autonomous driving, medical screening, and supply-chain management. In many of these fields, the application of the technology has been extremely successful, predicting consumer demand and the outbreak of pandemics much more reliably than human intelligence.
However, there remain some problems with the basic way in which machine learning works. ML algorithms require huge amounts of data, and data processing capability, to provide reliable predictions. Even if these resources are available, the algorithms can fail. What’s worse, they often fail in unexpected ways, which makes managing the risks of deploying them difficult.
In this article, we’ll look at where prediction ML falls short, the implications of this, and what can be done about it.
There are more than a few examples of ML failing to make reliable (or even useful) predictions. One of the earliest was a program built by Google, which attempted to predict flu outbreaks based on searches for information about the flu. To start with, the model was accurate, but around 2013 it started to run into trouble, and made predictions of outbreaks in places where there was no such problem. Google stopped the program – very quietly – in 2013.
A more spectacular example occurred in 2013. In April of that year, Syrian hackers managed to break into the Associated Press twitter feed. They used this illegal access to send a tweet that read: “Breaking: Two Explosions in the White House and Barack Obama is injured”. Though human journalists – or at least the responsible ones – immediately sought to verify the story and quickly found it to be fake, the ML algorithms that are used to invest in the US Stock Market didn’t. They rated the news as reliable, and the Dow Jones Industrial Average fell by $136 billion dollars. The market plunge was corrected 3 minutes later, but it shook confidence in the market for years.
Ten years ago, though the failures of ML systems were well understood, they remained an academic novelty. Nowadays, though, ML algorithms are deployed in so many diverse fields that the implications of failure could be severe. ML algorithms are now used to make healthcare decisions, detect identity theft and fraud, and even make court decisions, and in all of these areas the failure of ML systems could lead to significant human suffering.
The Reasons for Failure
ML algorithms fail for two major reasons. One is when users are actively trying to subvert the model that the ML system has been trained on. This was partially the case, for instance, in the Google flu algorithm mentioned above, but it also frequently occurs in other fields. The whole industry of cyberdefense, for instance, relies on the ability of humans to act unexpectedly, and to avoid coming to the attention of ML-driven cybersecurity systems.
The second major reason for failure occurs when human behavior abruptly changes. This is seen most commonly in shopping algorithms, not least because this is where ML models are commonly applied. ML systems can be great, for instance, at figuring out that searches and purchases of hand sanitizer are a good prediction of someone being pregnant. Unfortunately, if a global pandemic breaks out (say Covid-19), the model will suddenly report that half the world is pregnant. This might seem like an unusual example, but in reality, online retail algorithms are very often confused by abrupt changes in what people are buying, and these changes in consumer behavior also affect algorithms modeling inventory management and fraud detection.
In both failure modes, the implications of malfunctioning MLs are made more severe by another factor: that in many cases the model itself does not accurately estimate the reliability of its predictions. Recent research indicates that MLs taught to recognize images are not only bad at assessing image types they have not seen before, but that (worse) they report high levels of confidence in their incorrect guesses.
The Solution: Robustness
Overcoming these challenges will not be easy, and will likely rely on contributions from many fields of computer science. Some researchers, for instance, are working on more advanced data analysis techniques such as topological data analysis, which promises to allow MLs to assess their training data more effectively. While these approaches are being explored, however, we will need to look at the way that MLs are embedded in commercial and social systems, and seek to limit the effects of their failure.
The study of ML failure, and its implications, has come to be known as the study of “robustness.” Researchers in the field point to the fact that the best applications of machine learning to date have been those where algorithms work in tandem with humans, rather than attempting to make black-and-white decisions about unique situations. Many of the ML systems that are now deployed in healthcare settings, for instance, can help medical staff to make diagnoses, but no-one is suggesting that ML predictions are anything more than a guide.
Limiting the Risk
This is not to say, of course, that ML systems have no place in the modern economy. Speech recognition, machine translation, and web search applications, where large, continually updating models can deliver accessibility improvements with little risk, are likely to continue to be major beneficiaries of ML.
The issue will come when ML systems are given the power to make binary decisions about rapidly changing situations. In these cases, it’s abundantly apparent that the reliability of these systems will need to be dramatically improved before we can rely on them.
About the Author
Bernard Brode has spent a lifetime delving into the inner workings of cryptography and now explores the confluence of nanotechnology, AI/ML, and cybersecurity.
Sign up for the free insideBIGDATA newsletter.