Breaking News

Artificial Intelligence and Machine Learning – Present & Future

Steve King: [00:13] Good day, everyone. This is Steve King. I’m the managing director at CyberTheory. Today’s episode is going to talk about machine learning and artificial intelligence and where we’ve come and where we’ve got to go. And I have the pleasure of talking this morning with Liran Paul Hason, who’s the co-founder and CEO of Aporia. Aporia is a full stack machine learning observability platform used by Fortune 500 companies and data science teams around the world. It helps monitor billions of daily predictions and maintain AI responsibility and fairness, and we’ll get into what that means in a minute. Prior to founding Aporia, Liran was a machine learning architect at a company called Adallom, which was acquired by Microsoft. And he wrote his first programs and applications in his early teens. He also served five years in the IDF before leaving with the rank of captain. So welcome, Liran, how are you?

Liran Paul Hason: [01:21] Great. Thank you. Happy to be here.

King: [01:25] Thank you for joining us today. So let’s dive right in. Tell me what the current state of AI and ML are in cybersecurity, and what the most interesting and promising applications for these technologies are right now?

Hason: [01:43] So, I think it’s interesting to look at what happened in the cybersecurity space with regard to AI. So, if we go back in time, I think that in the past few years, a lot of companies bought visibility solutions, for example, solution to have visibility for your devices in your organizations. So, if you have Mac devices, mobile devices, maybe you have a smart car, you want to have visibility and to be able to answer how many devices I have in my organization. But that’s just one, then we have CSPM for providing visibility for cloud security and so on. So, a lot of security teams acquired a lot of visibility tools, which ended up with a lot of data. So, they have a lot of data about what’s going on within their network and what devices and what activity the devices are doing, what the users are doing. Maybe they’re connecting from California, a few minutes later, the same user is connecting from another country. So they were able to gather a lot of data, which created a huge challenge. So what do we do with all this data? Because if we need our security team to start reviewing all these information and all these alerts, it will never end. And not only that, usually there are plenty of false positives. So how do you deal with that? So, on one hand, you challenge, but that’s also an opportunity, because once you have a lot of data, you can start mining this data to derive insights and to make better decisions. And that’s what we see, I think, in the last few years, where more cybersecurity solutions have started to incorporate AI technology in their offering, therefore, allowing their end users, different security teams to make smarter decisions instead of losing themselves in piles of data. They can enjoy for much more sophisticated and more to-the-point insights and alerts on different security events.

King: [03:57] I was just going to ask you though, you are going to talk about what you thought the most promising applications for AI and ML are right now.

Hason: [04:07] So, there are multiple applications for AI and ML. And just to name a few. So, for example, even back in the day at Adallom. So I was part of Adallom Labs team, where we collected a lot of data from the cloud environments from our clients. And we implemented machine learning models to find anomalies in the behavior of these users. Think about Fortune 500 company with tens of thousands of users. Each one of them is behaving differently. So just setting a set of rules to find a potential attack usually would end up with tons of false positive alerts. So that’s exactly what we did back then. We utilize machine learning to identify anomalies within the behavior in each user within the organization. And then our system was able to alert upon suspicious behavior using machine learning, because it already learned the baseline behavior of each user. And when something got changed dramatically, it could alert.

Steve King: [05:18] So as that working off of a behavioral model that tried to say, “normal employee behavior is X, anything X plus X, X plus Y, would equal anomalistic behavior” and then flag that?

Hason: [05:39] Yeah, so that would be a very simple description of that.

King: [05:43] Our audience and myself are both simple minded folks, trying to wrap our brain around machine learning, that’s not going to happen. But from a use-case point of view, a lot of us worry about the amount of the human factor that’s involved in cybersecurity these days. And what many of us want to have happen is that human factor be automated, where it’s not being used for analytical or complex decision-making purposes. So that’s, at least from my point of view, and from, I think, our audience point of view is that’s what folks are hoping that AI is going to be able to do someday so that we can lower the instances of the human error caused breaches?

Hason: [06:36] Absolutely. So, when you have that much amount of data, like you have all these from the different cloud providers, for example, Steve, as a user, go to G Suite and started downloading files from the Google Drive. Later on, he went into a few emails, then he went back, maybe deleted a few files. So what is the normal behavior, Steve? And when does that start to be anomalous? This is a very open, challenging question to answer, especially when each and every person is different. And as you said, it becomes very challenging. And this is where AI comes handy. So instead of an engineer or security engineer trying to analyze these piles of data and activity, we see more companies using AI to analyze the data to find out the anomalous behaviors and, therefore, reduce the amount of false positive alerts that get upon potential attacks.

King: [07:42] Yeah, there are privacy issues around using that data, as well. And I think that there aren’t a lot of machine learning models, leveraging synthetic data for testing and product evaluation.

Hason: [08:01] Yeah. So, this is another concern when it comes to AI and machine learning. So, let’s talk a bit about what is machine learning? How does that work? And how is this different than traditional software engineering? So, why do we have traditional software engineering? Do you have a software engineer that has the monetary view to data, analyze some kind of behaviors, and they can code a specific identification or detection for specific behaviors? For example, saying, if he logged in from the U.S., from Arizona, five minutes ago, and an hour later, he logged in from France. Well, no way it could be unless he’s Superman. So, that’s not the case, as far as I know. And therefore that would be considered as an anomaly. So this would be something simple behavior that a software engineer can write down and implement. But when we think about the cybersecurity attacks, and the way attackers got evolved in the last years, they became more sophisticated. So using this kind of behaviors and algorithms wouldn’t be enough to capture an attack or to realize that an attack happened. And that’s where AI becomes powerful. So just maybe to elaborate a bit more about what’s AI and how it works. So we set up this engineer, writing specifically the set of rules to identify behavior. We just take piles of data, for example, from the past five years of network activity, user activity, some of them are flagged as attacks or malicious behaviors. And you let the machine learn and find out on its own what are the patterns that identifies an attack. So by the end of this process, what we get is called a model and this is a software from the data. And essentially, we can now send some new data on activity. And this model will be able to tell us, is it anomalous, is it a suspicious behavior, this might be an attack or not. So to point to your question about data sensitivity, when it comes to machine learning models, data sensitivity is an interesting issue.

King: [10:27] Yeah, and I am assuming that we’ve got embedded machine learning models in applications today, like in finance or credit risk. If you figured out access, and you got into one of those models and made a couple of minor changes, you could manipulate a lot of the deterministic outcomes in terms of what the model was trying to do? And then you could make that outcome in your favor. Isn’t that a big risk to the staff now?

Hason: [11:14] Yeah. So, what you’re talking about is when we do use machine learning as part of cybersecurity solution, one is similar. In general, when we use machine learning, one of the main risks is that a malicious attacker might manipulate the behavior of machine learning model by creating an effect or changing the training data. How can they do that? So you might think that they need to get an access to the data, and that requires an attack of its own. But the reality is that it’s more simple. Why is that? The reason is very simple. And this is because we, as consumers, we are using machine learning models, even when we don’t. When we go, for example, when we submit a loan application request, we see behind the scenes, a lot of times there’s a machine learning model, estimating the risk for the organization of giving us a loan. So, by us sending an input to that system then, as part of the feedback includes our data, which is going to be used for retraining a better model in the future. So that means that as a general user, as a consumer, you can affect the data that we’ll be getting to the training of the model. You don’t have to be very sophisticated in order to do that. And then, what you can do by doing so is you might be able to manipulate the behavior of the model, if you’re able to insert the right data points.

King: [12:52] I think one of the use cases that most folks understand and we’ve all probably read about the activity at Google around their research group that was trying to eliminate biases and in modeling – how do you, in the employment application, which a lot of systems today use AI and machine learning as part of that process – it’s freighted with biases, is it not? And isn’t that problems/opportunity, as well? What has been your experience around, looking at that problem?

Hason: [13:42] Fairness is definitely an interesting challenge to deal with. So, in traditional software, where you have an engineer that can ask, “how does the algorithm work? What is the logic here?” But you cannot have the same questions for the builder of the model. Because the model incentive, the kind of blackbox, you can think of it as a bunch of numbers, that’s difficult to know which works, but we are not clear about the exact logic. And because it’s a blackbox system, that means that we don’t necessarily know whether the model have some unintentional bias or not. So that’s an interesting challenge that we had to, at Aporia, we try to tackle and help organizations overcome. And the way we do it is by tracking and monitoring these machine learning models as they run in production. And we see that the system sees the decision that these models are making. If we were to get back to the loan application model, we can see how many users got rejected, how many approved and is there any unintentional bias within the model?

King: [14:59] That’s assessing or reviewing the output from the system. I think you characterize it as a blackbox. And when you look at it, it’s hard to know what’s going on inside. At some point, it’s building its own code vocabulary that we, as the creators, will never be able to understand or explain. Because we can’t. You talk about probabilistic versus deterministic, and that the accuracy is limited because of that, because it’s probabilistic, limited to 10-15 points, not 100%. So, it’s not just how can we trust the system that we can explain the way it comes to its predictions, but how do you explain those predictions to the regulatory folks in industries like the financial sector, where, at least in the United States, financial firms are obligated by law to explain to their customers why they were rejected for a loan application.

Hason: [16:16] Yeah, and this is a huge challenge for the financial sector in the U.S., because they’re obliged to provide these kind of explanations to why they, as a business, made the decision, regardless if they’re using machine learning or not, but if they want to use machine learning, that becomes a real challenge. And that’s part of the things our solution helps with. So it doesn’t solve the blackbox problem of AI. But instead of what it’s doing, it analyzes the model and the decisions the model is making. And then it is able to generate a human readable explanation for what led the model to reach specific prediction. So with financial regulation, if you submitted a loan application and you got rejected, with Aporia, what you will get exactly is a sentence telling you “The reason this loan application got rejected is because you have low income rates.” And if you were to increase the income rate, you would become eligible for a loan application. So it is something that we see more companies start paying attention to, and starting to focus on, solving and adopting solution to overcome these challenges.

King: [17:34] So, because you focus more on categorical explanation versus specific explanation within the category, you’re able to satisfy audit requirements from a regulatory point of view. In other words, you just said, “you had a low-income issue, and that’s why we rejected you.” You don’t have to prove what that income was or what your standards are, any of the rest of that, is that correct?

Hason: [18:05] Yeah. But it can also get down to the level of a single data point at Aporia. And you can say, for a specific person, exactly what led the model to reach that decision?

King: [18:16] Is it ever wrong?

Hason: [18:17] No. So the explanation, and the way it works is based on a known concept called Shapley Values. So it’s based on that. It’s an algorithm from Game Theory. And it’s very scientific, the way that our system analyzes the behavior of the model.

King: [18:41] Forbes is very supportive. I guess they’ve named you as the next Unicorn, the next billion-dollar startup. Did that just recently happened? Or was that earlier in the year?

Hason: [18:54] Yeah, so that happened a few months ago. I think the last year was very significant for Aporia. We’ve been working with large organizations, as well as some smaller and leading tech companies on helping them with getting their machine learning model to production, helping them evaluate the business value they’re getting from these machine learning model. We started this discussion talking about cybersecurity and in cybersecurity, to deal with it, and monitoring it is very basic. It’s not a new idea. But in the machine learning world, because it’s a new field, a lot of companies are already running machine learning models in production, and they’re lacking that visibility, what predictions these models are making, how well they are performing, and having something like Aporia becomes a necessity. So I believe that’s part of the reason that Forbes sees Aporia that way. And in addition, if you think about it, in every industry, whether it’s cybersecurity or financials or automotive, wherever you use machine learning in production and you make decisions that are affecting either people who buy for your business, you have to have some kind of visibility. So you have to have a solution like Aporia in place.

King: [20:18] Are your customers under any obligation to their customers to disclose what techniques and/or algorithms they’re using to make whatever determinations they’re making?

Hason: [20:34] I guess it depends. We are working with cross-industry customers, some of them are not obliged for that. Others are more national. Some, we have clients in financial services, where this is mandatory.

King: [20:50] Yeah. All right. Do you expect that in the next year or two that we’re going to see AI and machine learning and show up in the majority of applications that are developed, or however you want to describe that use cases, or if we look around the computing environment – is that your expectation?

Hason: [21:16] Definitely. And I think that we are being affected already by more than we think, as consumers. In so many cases, when we go and search a website in order to buy, which could be Amazon, for example. So, if you are familiar, when you go to amazon.com, you have some items that are recommended for you. This is based on machine learning models, when you perform a credit card transaction and you get an SMS message saying that this is a suspicious activity, you need to answer this number. This is also based on machine learning. So we bet that very much by AI, whether we know it or not, but the pace with which the industry is growing is crazy. And I see a significant growth in the next few years.

King: [22:05] That’d be terrific for you. I agree. I’m sure that’s going to be the case. It does raise an issue in the world of privacy, however, and I’m not sure how we’re all going to deal with that. Because there are regulatory bodies and federal agencies in our country anyway, that just woke up and discovered that there could be a problem here and are flexing their muscles around the privacy issue, in spite of the fact that we happily give up all of our personal identifiable information anyway. I find it humorous that people get excited about this. Are there privacy issues that you see that will isolate an industry sector or group of applications from leveraging machine learning or artificial intelligence?

Hason: [23:08] Well, privacy is a super interesting challenge. We’ll just think about Apple face detection, we all use it to unlock our phones. But what happens with this information? Super interesting question, I believe that this space should be regulated as well. I know that regulators, both in Europe and in the U.S., already have some active discussions about how to regulate AI, and in a way, to create the GDPR just specifically for AI technologies, in order for us, as a society, to be able to adopt this technology in a responsible way. But I highly agree with you here. Data privacy is an interesting challenge. There are multiple technologies and techniques in order to overcome these challenges. When you build these machine learning models from the first phase, and we start to see them more prominent.

King: [24:10] I’m sure it’ll be interesting to see how this all works out. I’m sure you’ll be in the thick of things. And I see by the clock on the wall that we’ve used up our half an hour and I want to be conscious of that, for your sake, as well. So, thank you for spending this time with us. And I think, as you point out, there’s some interesting conversations to be had, going forward as well. And if you’re willing to, I’d like to have you back in six or nine months and then revisit the same area and see what was happened and how much progress we’ve made around that.

Hason: [24:19] Definitely, that would be great. Thank you for the conversation. I enjoyed it.

King: [24:49] Good. I did too, and I hope our listeners did and until next time, I’m Steve King, signing off.