Thanks to the exponential growth of malware, traditional heuristics-based detection regimes have been overwhelmed, leaving computers at risk. Machine learning approaches can help, but the bottleneck presented by the feature engineering step is a potential dealbreaker. The best path forward at this point is deep learning, says the CEO of Deep Instinct, which claims to have taken an early lead in the emerging field.
Ten years ago, the cybersecurity industry faced a dilemma. The volume of malware was exploding, with tens of thousands of new types discovered every day. Traditional antivirus products, which were evolving from rudimentary signature-based methods to slightly more advanced heuristics-based approaches, were struggling to keep up.
Classical machine learning approaches, with its potential to automate the identification of anomalies hidden amid vast amounts of incoming bytes, offered a potential path forward. Many security software vendors added machine learning capabilities to their traditional heuristics-based antivirus engines, with the hope of catching more malware before it infected systems.
Progress was being made, but data volumes kept growing at a geometric rate. Today, security firms estimate there are anywhere from 500,000 to 700,000 new malware types identified per day. Keeping up with that analytical workload is stressing both humans and machines, says Guy Caspi, CEO of Deep Instinct.
The biggest problem with traditional machine learning approaches is feature engineering, Caspi says. In order to train the machine learning model to identify new malware types, human analysts are needed to identify the features of the new malware.
“This is the [reason] most of the machine learning companies need to update 15 to 20 times a day,” Caspi tells Datanami. “It’s almost Mission Impossible to digest all these processes. This is why you see a ransomware pandemic. They can’t stop malwares that are coming with the ransomware.”
In 2015, Caspi and his colleagues, Eli David and Nadav Maman, co-founded Deep Instinct with the idea to use emerging deep learning approaches to bolster cybersecurity. With deep learning, the malware detection regime goes further up the abstraction stack. Instead of looking for specific snippets of malware code or other approaches that demand an exact match, deep learning takes a more generalized approach, which allows it to spot zero-day threats at a much higher rate than other approaches, the company says.
“It’s very flexible because deep learning is imitating the way our brain is thinking,” Caspi says. “Deep learning is working directly on the raw bytes. You just throw all the data on the brain and it learns. It learns because the data has been labeled in advance.”
Caspi uses the familiar example of identifying cats and dogs to explain the difference between traditional machine learning and novel deep learning approaches.
“If I give you a picture of a cat or a dog that you’ve never seen, you will still have the understanding that this is a dog and this is a cat. The reason for that is you have been exposed to hundreds of dogs and cats,” Caspi says. “If you go to the machine learning, it will tell you, this is dog and this is the breed of the dog. If you send it a different dog, it will say, what is this? So this is the difference between machine learning and deep learning.”
As Caspi mentioned, there is a catch to deep learning: the need to label the data in advance. This poses a substantial challenge, and is something that the Deep Instinct team spent years addressing. The company developed an automated pre-processing step that can account for the large differences in the raw data used for training the deep learning model.
Humans still play a role in the deep learning loop at Deep Instinct, which has over a dozen PhD-level data scientists trained in deep learning. But since humans aren’t needed to perform the feature engineering step required for daily updates to end point software, the role humans play is not as time-critical. Because its deep learning model essentially is continuously learning and refining its definition of malware based on billions of samples gleaned from malware repositories, such as MITRE ATT&CK, Deep Instinct only needs to update the inference algorithm that implements new attack vectors twice per year, Caspi says.
The last time we visited with Deep Instinct, the company had just a handful of customers. But business has blossomed since then, thanks in large part to an OEM deal with HPE that has accounts for about a million end points. All told, the company today has more than 2,500 paying customers and is protecting more than 3 million end points, including PCs, mobile phones, and other devices, Caspi says.
We’re currently in a state of upheaval and change in the cybersecurity market, with trusted names like Symantec and McAfee out of the picture. Malware detection regimens that are based on heuristics alone are badly outmatched by the malware makers, who are using automation to crank up production of their horrible products and overwhelm outdated defenses. The standard bearer in the market today are machine learning-based approaches, according to Caspi, but even they’re struggling to keep up. That leaves Deep Instinct and a handful of other vendors treading the deeper neural network waters.
Caspi is clearly proud of what his team has accomplished at Deep Instinct, which in April completed a $100-million Series D round of funding led by BlackRock, and which is also financially backed by Samsung, LG, and NVIDIA.
“I think it’s game-over,” Caspi says. “It’s not 100% bulletproof. But if you see our results, it’s by an order of magnitude better than any other vendor in the market, prevention-wise. I can tell you that in the last six months, big venders when they hear that there is a POC with Deep Instinct, they don’t want to compete.”
Deep Instinct has received five patents for its software, Caspi says. The barrier to entry in applying deep learning to cybersecurity is quite steep, which gives Deep Instinct a decided advantage, even over the tech giants, he says.
“There are no people in the world in this domain. It’s still a very, very small domain,” he says. “There is a huge amount of other problems that do not exist almost in any other domain….and they exist in cyber security because in cyber security, it’s a mess. It’s a huge amount of data, very complex.”
Caspi suggested the barrier to entry was too great even for Google, which he says tried to use TensorFlow to create a malware detection engine. “It’s great for convolutional neural networks, if you want to do computer vision. For medical application, that’s great,” he says. “If you want to have something like cybersecurity, which has thousands of different parameters and not just three, it’s Mission Impossible. And you have to do it in runtime.”
The recent Solar Winds hack provided a handy test case for Deep Instinct. None of the customers using its software were compromised by the attack, Caspi says. Only Deep Instinct and Palo Alto Networks were able to make that claim, he says.
Looking forward, Deep Instinct plans to ramp up its sales and marketing initiatives with the $100 million Series D round. The company may have another round of funding before going public, Caspi says.
Navigating Data Security Within Data Sharing In Today’s Evolving Landscape
A Deep Learning Approach for Detecting Unknown Malware