Understanding the vast role of AI and ML in cybersecurity

Artificial Intelligence (AI) stands for computer systems that mimic human intelligence, namely the ability to perceive, recognise patterns, and take decisions based on an algorithmic “gut-feel”. Just like the decisions taken by human beings, this may not be a hundred percent accurate.

Machine Learning (ML) is the science of machine-building and refining that ‘experience’ over time, as it learned from observations over time(context-cause-effect correlations).

The human perspective is built with a complex set of correlations (i.e. you can ‘see’ the colour of an object, and in some context, perceive it to be ‘hot’ without touching it), self-experiments (such as by touching a tray straight out of the oven), learning from others (such as by watching people in pain when they accidentally picked up the sizzler plate bare hand).

This simple temperature perception, as stated in the above example, will vary from person to person based on the exposure and experience.

AI/ML is useful when the input to the system (sense observation) is varied, large in dimension, inapparently correlated, sparse, dynamic, and cannot be represented as simple static rules.

How have the cyber-systems evolved?

Loss of unique identities

With the advent of the internet, the systems developed a unique network and device identifiers (such as IPv4 address and MAC address). The explosion of devices using the internet has created scarcity in the pool of network addresses available. Technology solves this issue by re-using the same IP addresses across network hierarchies (NAT).

Similar changes are being experienced with network device identifiers (MAC addresses), which used to be unique by design. Contemporary systems (Android, IoS, Windows etc) rotate and anonymise the MAC addresses to provide security and privacy.

With unique identities lost, a simple static rule to distinguish good source from bad is not possible. Finding the source reputation in this context (ambiguous identity) has to be augmented with AI/ML (finding patterns and refining them over time as they learn more about the source).

Rapid increase in the dimensionality of observations

The attributes of the threat actors are increasing rapidly. With the advent of the computer virus, a simple hash value could define a malicious file. Today, with advanced mutations, a combination of geo map, ISP/Source reputation, patterns of activity, trails, and innumerable other parameters need to be considered.

As the number of variables increase, the ability of rule that sets to segregate malice from the usual is not straightforward, rendering AI/ML as a better choice.

Conversational systems

User interfaces with digitised services are increasingly becoming conversational in nature. Conversations are unstructured (unlike clicking a check-box). Voice assistants, chatbots for customer service, e-mail based automated quotes, sales leads, or product inquiries are increasingly becoming part of business accessibility and automation.

To protect such automated services, we need the same level of conversational intelligence.

Security vs privacy

Hiding personal information from the observed data, while deriving intelligence and insights that help make near-accurate decisions is not an easy task.

Anonymisation induces additional layers of cleaning, and induce sparsity to the data set. AI/ML is proven to be more effective with this type of dataset.

How AI/ML is being used for cybersecurity

The new age enemy does not wear a uniform, speak a different language, or look different. Therefore, it becomes necessary to identify which parameters can provide a clear rule to segregate good samaritans from the bad apples.

There is no simple mathematical formula that can let us know if a person has a malicious intent or not. However, if we track hundreds and thousands of data points and activities related to the person, we can create a suspicion profile with acceptable efficacy.

Behavioural analysis and anomaly detection

The adversaries are mimicking genuine users, and unlike humans, they (being machines) do it tirelessly, at an unprecedented scale. The only way to find the malice among billions of internet transactions is to gather hundreds of attributes of all (genuine or otherwise) transactions (such as source-reputation, geo, IP, time of interaction, history, ability to pass a challenge, global traffic patterns, mutation mapping, encryption entropies etc).

This large-scale data then needs to pass through layers of cleaning, observed for patterns, and finally form a decision tree. However, things can’t be put to rest just at that. It is no longer feasible to defend a system with just rule-based static policies. It has to be augmented with behavioural analysis focused on detecting anomalies in real-time.

Suppose an office CCTV camera is sending too many tweets about movements of newly recognised faces on a holiday. In that case, it could very well be a new hiring drive organised on holiday (benign), or an attacker who has taken over your camera to launch a denial of service attack on Twitter (malicious). AI/ML excels at understanding the context and correlating the events with it. This technology excels in forming and improving an opinion on what is “normal” and what is an “anomaly”.

TL;DR

A human is capable of perceiving a threat based on their experience. Today’s cyber system needs the same ability to perceive a threat at a large scale to keep up with the growth in digitisation. The cyber systems use AI/ML to build that human-like intuition and threat perception using behavioural analysis. The only difference being, unlike humans, machines can learn faster and work tirelessly.

(Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the views of YourStory.)