Complete Guide to Using AutoSklearn – Tool For Faster Machine Learning Implementations – Analytics India Magazine

  • Lauren
  • October 3, 2020
  • Comments Off on Complete Guide to Using AutoSklearn – Tool For Faster Machine Learning Implementations – Analytics India Magazine

Automated machine learning algorithms can be a huge time saver especially if the data is huge or the algorithm to be used is a simple classification or regression type problem. One such open-source automation in AutoML was the development of AutoSklearn. We know that the popular sklearn library is very rampantly used for building machine learning models. But with sklearn, it is up to the user to decide the algorithm that has to be used and do the hyperparameter tuning. With autosklearn, all the processes are automated for the benefit of the user. The benefit of this is that along with data preparation and model building, it also learns from models that have been used on similar datasets and can create automatic ensemble models for better accuracy.

In this article, we will see how to make use of autosklearn for classification and regression problems. 

Installing the package

Before we understand how to build models with autosklearn we need to install the package in our working environment. To do this we can use the pip command if you have a Linux Operating system. 

pip3 install auto-sklearn

However, if you are making use of Colab you will need to install the following:

!sudo apt-get install build-essential swig
!curl | xargs -n 1 -L 1 pip install
!pip install auto-sklearn

This will install the library and we can move to the next step. 

AutoSklearn for classification problems

Now that we have everything needed to start we can build a model using autosklearn on a classification type problem. For these types of problems, we need to configure the method called AutoSklearnClassifier. Let us first select the dataset and then proceed with the model. 

The dataset

I will use a simple wine quality dataset from the UCI repository. For using the same dataset you can download it here. Now let us load the dataset.

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
from autosklearn.classification import AutoSklearnClassifier
wine= read_csv(‘’)

Splitting the dataset

Now, let us split the dataset into training and test sets and also split the dataset into features and targets respectively. 

dataset = wine.values
ft, target = dataset[:, :-1], dataset[:, -1]
X_train, X_test, y_train, y_test = train_test_split(ft, target, test_size=0.2, random_state=1)

Building the classification model

Since we are using auto-sklearn, we need not specify the name of the algorithm or the parameters. These are done automatically for us and the final result is displayed. 

autosk = AutoSklearnClassifier(time_left_for_this_task=60*2), Y_train)

Time_left_for_this_task is the amount of time the user specifies for searching all the right models. I have allowed the search to take place for two minutes but you can choose any amount of time as you wish. 

Now we have the statistics of the model and the algorithms that were checked were 21. Let us now see the accuracy of the model. 

pred = autosk.predict(X_test)
print(“Accuracy score”, sklearn.metrics.accuracy_score(y_test, pred))

This is a good score since we have not scaled or pre-processed the data and we have allowed the model to run only for 2 minutes. Thus, we have built a classification model using autosklearn. 

Autosklearn for regression

We have already seen how autosklearn works for classification type of models. Next, let us implement this for a regression problem and check the results. 

The dataset

For this, I will use the built-in sklearn dataset called Boston housing dataset. Let us now load the dataset. The task here is to predict the price of houses in Boston using the features given.

See Also

from sklearn.datasets import load_boston
import pandas as pd

Splitting the dataset

Let us split this dataset into train and test data using the train_test_split function of sklearn.


Model building

Just like we used the autosklearnclassifier for classification, we will be using autosklearnregressor for regression models. 

regressor=autosklearn.regression.AutoSklearnRegressor(time_left_for_this_task=60*5), ytrain)

Here I have given the time as 5 minutes to see the impact on the results. 

Now, let us see the statistics of the model along with the error rate. Since this is a regression problem we will use the mean absolute error as the metric. 

pred= model.predict(xtest)
mae = mean_absolute_error(ytest, pred)
print(“MAE:” ,mae)

This shows that the error is very less which means there is less loss and the model has performed very well. It also shows that the validation score is 0.86 which is good accuracy. As we see the model has searched 57 algorithms in the 5 minutes and has performed really well. 


In this article, we saw how to use autosklearn and build both classification and regression models without having to specify the name of the algorithm. We achieved good results in both of these models. AutoSklearn can be really useful in business analytics and research to build faster and better models. 
#wpdevar_comment_1 span,#wpdevar_comment_1 iframe{width:100% !important;}

If you loved this story, do join our Telegram Community.

Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.