How machine learning differs from traditional software – BL on Campus

In the last column, I discussed the difference between artificial intelligence and machine learning, and referred to narrow intelligence, which is a machine’s ability to perform a single task very well. Now, let’s use a more formal definition. The below definition is from Professor Tom Mitchell of Carnegie Mellon University:
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Among the several definitions of machine learning, this one stands out because of its practicality for managers. Leaving aside the underlying technical details and complexity for a moment, this helps managers make sense of an AI system in terms of:

What decisions the software needs to make (T)
What data to collect (E)
How they will evaluate its results (P)

Key differences

This definition also hints at the differences between traditional software programs and machine learning programs. For one, regular software (example, your word processing tool or even enterprise software such as CRM or ERP systems) is expected to work the same way each time you use it. But in machine learning, we expect the system to get better and better with usage. For example, if you are using a speech recognition software, even if the software does not recognise all your words at first, you expect it to get better over time with more usage.
There is another fundamental difference between programming traditional software and machine learning approaches. In the conventional programming approach, application logic, conditions, and business rules are specified and coded upfront. When the software is deployed in production (i.e., when the software is used in real life), these hard-coded rules are applied to the data to arrive at the output.

 In contrast to that, in machine learning, the rules are not hard-coded. Instead, the rules are detected by machine learning (ML) programs based on the underlying data, in two steps.
The first step, referred to as the training phase, consists of feeding data and the expected outcome for each data example to the ML program. We can think of the output from this phase to be a prediction model (i.e., a set of rules). Note that model building (or training) is an iterative phase and different ML techniques are tried out to see which one performs better. In general, the more the data available to train the model, the better is its performance. How is the model performance tested? The data is divided into portions called training and test (or holdout), with the training portion used for developing the model and the holdout portion used for testing the accuracy of the model.

 After the model is created, it is deployed to production and “inference” refers to using the model to make predictions in real usage scenarios. Note that deployment of ML models to production is a complex topic by itself and there is an emerging field called ML Ops or AI Ops.
Employing data effectively
The performance of the model can degrade after deployment if the data it encounters is different from the data it is trained on. This underscores two key points. First, post-deployment monitoring of model performance is important and should be part of your plan. In this sense, a machine learning project is an ongoing effort. Things change constantly in the real world and the models need updating in response. For example, facial recognition systems don’t work when we started wearing masks during the Covid-19 pandemic.
Two, it’s not just quantity but also quality of data that matters. The data used for training should be representative of the real-world data. If there is not enough sample data of a particular category, the model won’t perform well for that category, even if it performs well for other categories.
This seems innocuous enough but can lead to totally messy real-world outcomes. To continue with the example of facial recognition, the error rate of many commercially available systems is much higher for dark-skinned people and women because their training data largely consists of images of fair-skinned people and men. When police departments rely on such error-prone systems, it leads to unfair outcomes. This is an example of AI-bias or AI-discrimination. Make sure to understand the data used for any model.
It is often said that data is the new oil. But just like oil, before it can be useful, it needs to be cleaned, processed, and refined.