Machine learning (ML) has become foundational across many industries over a few years. Supervised learning is currently enjoying the lion’s share of machine learning. For example, it is concerned with a prediction like whether a customer will churn or not. Supervised learning is also concerned with a classification such as if an email is spam. It ensures whether a tumour in a diagnostic image is benign as well. You will need to leverage machine learning in one form or another to keep your competitive advantage. Here are a few things a business leader needs to know about machine learning to leverage it effectively.
1. Machine Learning models learn from Data
The first and foremost thing you need to know about machine learning is it brings a new paradigm of software development, Software 2.0 that is less about giving computers specific instructions on what to do every time they encounter. More specified high-level models which learn from data are known as training data. Consider the ML challenge of identifying spam email, given the text of the email.
In classical Software 1.0, you would write code distinctively to those with a tumour above a given size and of a certain texture. Among other conditions, it would be classified as malignant. In Software 2.0, you specify the type of algorithms you wish to use and feed them labelled data that is images, which have already been classified as benign or malignant. And the algorithms find patterns in these to generalise the clarification to new, unlabelled data. Most of the times, humans hand-label the training data, and for this reason, some researchers prefer the term ‘recycled intelligence’ to ‘artificial intelligence.’ It is because the machine is merely recycling the human intelligence contained in the labelled examples, and not making any form of intelligence.
2. Optimisation Function is the Key
In addition to labelled training data, you require to supply a machine learning model with an optimisation function which is also known as an objective function or loss function. It tells the algorithm what you’re optimising for. A commonly used optimisation metric is accuracy. It is the percentage of your data the model makes the accurate prediction for. There are many situations where you would want to be careful than blindly optimising for accuracy; the most prevalent when your data is imbalanced.
For instance, you’re building a spam filter to classify emails as spam, and only 1% of emails are identified as spam. This 1% of data is spam, and 99% is not- imbalanced classes. Then a model which classifies all emails as non-spam has an accuracy of 99%. Although it sounds great, it is a meaningless model. Similarly, if you have more men than applying women for the same job, merely optimising for accuracy can lead to gender bias.
3. Split Your Data
Readers may ask, “How do we calculate the accuracy of the model if we’ve already used all our labelled data to train it?” This is a key and important point. If you train your model on a dataset, you can expect it to perform better on that data than on new dataset it sees. To cope up with it, you could split the data into a training set and examine it before training the model. This procedure is known as ‘train test split’. Training the model on the training set, we compute the accuracy on the test set that the model hasn’t seen yet. Consequently, the computed scores have a better chance of generalising to new data.
4. Importance of Solid data Foundations and Tooling
Having higher quality data is a huge challenge. If your executive asks you how they can make their companies ML-driven, try to respond by showing them Monica Rogati’s Hierarchy of Needs that has machine learning close to the top as one of the ultimate pieces of the puzzle. This depicts that before machine learning occurs, you need solid data foundations and tools for extracting, loading, and transforming data as well as tools for cleaning and aggregating data from disparate sources.
5. Biased data influence Algorithms
The key is that machine learning can only be as effective as the data you feed it. If your data is biased, your model is certainly likely to be as well. For instance, if you’re creating a machine learning recruiting tool to predict the success of applicants based on curriculum vitae, and your training data is biased against women, your machine learning tool will also show biases against women. For instance, something similar incident occurred at Amazon. As Cassie Kozyrkov has analogised, a teacher is only good as the books they’re reading to teach the pupils. If the books contain biases, their lessons will be too.
Share This Article
Do the sharing thingy