Machine Learning algorithms are trained on huge volumes of data, leverage massive compute, and find insights that are applied for dynamic pricing, fraud detection, user segmentation and sales forecasting among other use cases.
In this cheat sheet, we’ll dive into the top 10 Machine Learning (ML) algorithms for engineers and data science enthusiasts. We’ll cover the foundational algorithms that are used for classification, regression, and clustering tasks. These algorithms are a core part of the data scientists toolkit and applied to various business problems. We will also put a spotlight on the current state of machine learning and what the future holds for 2020 and beyond.
State of Machine Learning in 2019Types of Machine Learning Algorithms10 Key Machine Learning (ML) algorithms
Support Vector Machine
Principal Component Analysis (PCA)
Gradient Boosting & AdaBoost
Machine Learning, a subset of artificial intelligence (AI), is one of the most transformative technologies. Recent advances in machine learning algorithms, along with Deep Learning are driving a massive impact on businesses and economies. According to a report, the global machine learning market is expected to reach $19.40 billion by 2023. One of the key reasons for the expansion of the market is accelerated data generation and the need to process huge volumes of data in a short time-span. Another key advantage machine learning offers is the ability to handle big data and new streams of data from Industry 4.0, edge devices, and rich applications. Additionally, surge in computational processing, advances in fully managed automated machine learning platforms such as AWS SageMaker, Azure ML, and expanding data storage options have spurred the adoption of machine learning in businesses. Some interesting applications of ML include chatbots, autonomous vehicles, retail management systems, healthcare network, and advanced cybersecurity.
Machine Learning algorithms can be segmented into three broad categories — supervised, unsupervised, and reinforcement learning (RL).
Supervised Learning is leveraged in cases where a label is available for a specific dataset, but needs to be predicted for other instances for which it’s missing.
Unsupervised Learning can be used in cases where you need to discover implicit relationships in unlabeled dataset.
Reinforcement Learning lies between these two extremes. It usually has some form of feedback for each predictive action/step, but no error message or precise label.
Now that we have seen the types of machine learning techniques let’s take a look at 10 key machine learning algorithms.
Learn More: What is Machine Learning?
Linear Regression analysis is used to predict the value of a variable (dependent) based on the value of another variable (independent). This analysis estimates the coefficients of the linear equation that involves one or more independent variables that best predict the value of dependent variable.
The linear regression algorithm uses data points to find the best fit line to model the data.
A line can be represented by the equation, y=m*x+c
where y & x are dependent variable and independent variable respectively. Basic calculus theories are then applied to find the values of m & c using the given data set.
This analysis fits a straight line/surface, which minimizes the inconsistencies between the predicted and actual output values. The linear regression method can be employed in a variety of environments and programs like Sklearn linear regression, R linear regression, Linear Regression in Python, etc.
Linear Regression is perhaps the simplest machine learning algorithm, and as ML practitioner Jason Brownlee emphasized in his blog, Linear Regression belongs to both the “statistics and machine learning arena.” While it is essentially a powerful statistical model, it is widely applied in machine learning for predictive modeling.
Use Case: Common use cases include forecasting sales based on trends, risk assessment in financial services, house price predictions in real estate, and understanding customer behavior.
Learning Resource: Want to get started with Linear Regression analysis? Click here
Logistic Regression analysis, also known as logit model, is used in predictive analytics and modeling and extends to machine learning applications. In this approach, dependent variable is finite or categorical.
Binary regression – either A or B
Multinomial regression – A range of finite options A, B, C or D
Machine Learning utilizes statistical concepts to enable computers to learn without explicit programming. The logistic approach fits best when the task which the machine is learning is based on binary classification (based on 2 values).
Your computer can use this analysis to predict the likelihood of a choice being made or an event happening. Case in point, if you want to predict the likelihood of a visitor choosing an offer on your website or not. The machine leverages logistic regression analysis to make determinations about promoting your offer and take actions itself. And, with more data provided, the self-learning algorithm evolves over time. Again, this is another statistical technique applied in machine learning for a range of classification tasks. Data Analysts also use Excel Analysis ToolPak to apply logistic regression models and make predictions.
Use Case: Primarily used for classification, some of the use cases include spam detection, credit card fraud detection, product categorization, and medical image classification.
Learning Resource: Here’s a great primer for getting started with Logistic Regression.
Decision Trees algorithms are decision support tools that use tree-like models or graphs of decisions or possible consequences, such as outcome based on chance-events. Decision Trees enable you to tackle the issue at hand in a structured and systematic way to logically deduce the outcome. To work on complex, high dimensional data machine learning based decision trees come up with the cascade of questions automatically by looking at the tagged data.
Earlier versions such as CART trees were used for simple data; however, for the larger dataset the commonly employed advanced algorithms used nowadays are Random Forests and Boosting Trees.
Random Forests build different classifiers on a random subset of attributes and combine them for output. Boosting trees train a cascade of trees one on top of others, by automatically correcting the mistakes of ones below them. This supervised method can be used to classify data points.
Use Case: Decision Tree clustering algorithms are applied for both classification and regression problems. Some of the use cases for Decision Tree include loan approvals, business decision-making.
Learning Resource: Check out Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart J. Russell
K-means algorithm is a type of unsupervised learning method that is iterative and non-deterministic. It is used when you have unlabeled data i.e. data without any defined groups/categories. The algorithm is used to find groups in the data, with the number of groups represented by the variable K. The various data points are clustered based on the feature similarity.
The K-means clustering algorithm results in:
The centroids of the K clusters, that may be used to label new data
Labels for the training data where each data point is assigned to a single cluster.
Instead of letting you define groups before looking at the data, clustering enables you to find and analyze groups that have been formed organically.
Use Case: Google uses K-means to cluster pages by similarities and discover the relevance of search results. It’s a machine learning tool with a moderate training time and good accuracy. As it doesn’t consist of many parameters, it’s easy to arrive at the best possible combination. The clustering tool is also applied for customer segmentation, identifying traffic patterns, and fraud detection.
Learning Resource: Here’s a great guide by AI heavyweight Andrew Ng and Adam Coates on K-means and its many advantages.
Learn More: 15 Best Machine Learning Books for 2020
K Nearest Neighbor (KNN) is the most popular machine learning classification and regression algorithm that’s simple, yet extremely effective. It finds applications in data mining, pattern recognition, and intrusion detection.
It uses some prior data/training data that classifies coordinates into groups identified by an attribute. In other words, the algorithm stores all available cases and classifies any new cases by taking a majority vote of its k neighbors. Then the case is allotted to the class with which it matches the most. The measurement procedure is performed by a distance function.
Though it takes less time to train, its accuracy may be degraded by high dimension data as there isn’t much of a difference between the nearest and farthest neighbor. It’s computationally expensive and requires the variables to be normalized.
Use Case: Widely applied in retail for credit card usage detection, facial recognition and various search applications.
Learning Resource: Get started on this supervised machine learning algorithm with this tutorial.
This is a type of classification machine learning algorithm that’s based on Bayes Theorem in probability. It works on the assumption that each feature is independent of another feature. Even if any relation exists it considers each individually, when predicting the probability of an outcome.
It is the most popular learning algorithm that groups similarities together. It’s used for document classification, run checks on text expressing positive or negative emotions, spam detection, facial recognition software and predict diseases.
The model has moderate training time and makes use of linearity. The Naïve Bayes algorithm is not only easy to use but it uses massive data sets efficiently. It works quite well when you have a medium to large data set to train your model. This classification algorithm is applied in a wide number of domains.
Use Case: Document classification is one of the most popular use cases. Other applications include weather forecasting and fraud analysis.
Learning Resource: This Google Tech Developer Guide provides resources on Naive Bayes algorithm.
Support Vector Machine (SVM) are supervised machine learning classification type algorithms. SVMs are used for regression as well as classification problems. They’re often used in situations where the training datasets teach the algorithm about specific classes to classify the newly included data.
The raw data is a plot in n-dimensional space (where n is the number of features). Then the value of each feature is matched to a particular coordinate, making it easier to classify data. Lines known as classifiers are used to split the data and plot them on a graph.
These algorithms are moderate in their accuracy and training times as it assumes linear approximation. To get the work done, these algorithms need an average number of parameters. SVM’s find application in image-based gender detection, display advertising, and image classification with large feature sets.
Use Case: Some real-life applications of SVMs include image classification, handwriting recognition, and text-classification.
Learn Resource: Here’s a good starting point on Support Vector Machines for text-classification.
Apriori is an unsupervised go-to algorithm for association rule mining. By sorting information it enhances the data management process, as the data users are appraised on the new information helping them figure out the data they are working with. It works on two basic principles, first that in case an itemset occurs frequently then all subset of itemset occur frequently and the other is that if the itemset occurs infrequently then all supersets will have infrequent occurrences.
From the given data set this algorithm generates associated rules. A bottom-up approach is followed where frequently used subsets are extended one at a time and the algorithm terminates when no further extension is possible.
The data can be entered into an artificial neural network or some other form of AI (artificial intelligence). However, it should be present with guiding information like dates or timestamps. The guiding information helps machine learning algorithm process categories to find patterns. A list of numbers can only be sorted based on the frequency or general amount.
It often works with huge data sets including thousands of entries of qualitative or quantitative data. This algorithm is easy to implement and can be used in parallel. It is generally expensive, slow and inefficient.
Use Case: One of the historical uses of Apriori algorithm includes Market Basket analysis – boosting customer sales by predicting their shopping basket. It also can be applied in healthcare to detect drug reactions. Google auto-complete is another popular application of Apriori algorithm.
Learning Resource: Here’s a good primer on Apriori algorithm.
Learn More: Top 10 Machine Learning Projects of 2020
PCA is an unsupervised, non-parametric statistical technique commonly employed to reduce dimensionality in machine learning. It utilizes orthogonal transformation to alter a set of observations of possibly correlated variables into a set of values to principal components (linearly uncorrelated variable).
Here are the steps involved in principal component analysis:
The first step in PCA is to standardize the data (with mean =0 and variance = 1).
Then the Covariance matrix of dimensions has to be computed.
Post this obtain the Eigenvectors and Eigenvalues from the Covariance matrix.
The Eigenvalues have to be sorted in descending order and the top k eigenvectors have to be chosen that correspond to the k largest eigenvalues (where k is the no. of dimensions of the new feature subspace k≤d, and d is the no. of original dimensions)
Construct the projection matrix W from selected k Eigenvectors
Transform X (original data set) via W to get new k-dimensional feature subspace Y.
It makes the large data simple and easy to explore and visualize. It also reduces the computational complexity of the model that makes machine learning algorithms run faster. It is not suitable for noisy data (as all components of principal component analysis have high variance).
Use Case: This unsupervised algorithm finds use in classification problems and is also used for facial recognition.
Learning Resource: Here’s how to implement PCA in Azure ML Studio.
Boosting is an ensemble modeling technique that creates more powerful accurate algorithms from weak algorithms. It involves building a model by using weak models in a series. First of all, a model is built from the training data. Second model is then built which tries to rectify errors found in the first model. Gradient boosting algorithms can handle massive amounts of data accurately and speedily. There are multiple gradient boosting algorithms such as XGBoost (uses linear & tree algorithm) and LightGBM (uses tree-based algorithms).
AdaBoost is the first successful boosting algorithm that was developed for binary classification. Also known as Adaptive Boosting it combines multiple weak classifiers into a single strong classifier.
The main difference between AdaBoost and Gradient Boosting is that in Adaboost,“shortcomings” are identified by high-weight data points, whereas in Gradient Boosting shortcomings are identified by gradients.
Use Case: Dubbed as the algorithms for champions by practitioners (used in Kaggle competitions), some real-world applications include predicting customer churn, and ad click-through rates.
Learning Resource: This guide is useful for getting started on boosting algorithms. Also check out the paper on XGBoost by its creator Tianqi Chen.
Learn More: What is Natural Language Processing?
We’ve only covered the basic theory that surrounds machine learning. To recap, we have covered supervised learning techniques like Linear Regression, Logistic Regression, Naïve Bayes and KNN. We also looked into unsupervised learning techniques such as Apriori, K-means, PCA and ensembling techniques like boosting.
To apply these theories to real life machine learning examples, you will need to develop a deeper understanding of topics discussed here to understand its intricacies. One of the major features of this revolution that stands out is how the machine learning tools and techniques have been democratized. During the past few years, data scientists have developed sophisticated data-crunching models with automated machine learning tools and the outcomes have been amazing.
Would you like to add any other ML algorithms to the list? Comment and let us know what you think about the key algorithms covered here on LinkedIn, Twitter, or Facebook . We would love to hear from you!