Efficient Machine Learning with Lazypredict

Emine Bozkus
8 min readDec 27, 2022

--

Lazypredict is a Python library that simplifies the process of training and evaluating machine learning models for classification tasks. It is designed to be used in combination with popular machine learning libraries such as scikit-learn and XGBoost, and can help you quickly train and compare multiple models without writing a lot of code.

To use Lazypredict for classification, you will need to first install the library using pip:

pip install lazypredict

Once you have installed Lazypredict, you can use it to train and evaluate machine learning models for classification tasks in just a few lines of code.

Here is an example of how you might use Lazypredict to train and evaluate using scikit-learn:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error, mean_absolute_error
from lazypredict.Supervised import LazyRegressor
from sklearn import datasets
from sklearn.utils import shuffle

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier

Regression

# Import the Diabetes Dataset
diabetes = datasets.load_diabetes()
diabetes
# Shuffle the Dataset
X, y = shuffle(diabetes.data, diabetes.target, random_state=13)
X = X.astype(np.float32)
# Split the Dataset
offset = int(X.shape[0] * 0.9)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]

The code you provided shows how to use the LazyRegressor class from Lazypredict to train a machine learning model for regression tasks.

The LazyRegressor class is similar to the LazyClassifier class, but it is specifically designed for training and evaluating machine learning models for regression tasks.

To use the LazyRegressor class, you will need to pass it the training and test data, as well as the target variables for both sets. You can also pass additional optional parameters such as the verbose flag, which controls the level of output produced during training, and the custom_metric parameter, which allows you to specify a custom metric to use for evaluating the model.

Once you have created a LazyRegressor object and passed it the training and test data, you can call the fit() method to train a machine learning model. The fit() method will return a list of trained models and a list of predictions made by the models on the test set.

In the example we provided, the LazyRegressor object is trained using the fit() method and the trained models and predictions are stored in the models and predictions variables, respectively.

# Train the LazyRegressor
reg = LazyRegressor(verbose=0, ignore_warnings=True, custom_metric=None, predictions=False, random_state=42)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)
# Print the Models
print(models)
Adjusted R-Squared  R-Squared   RMSE  \
Model
ExtraTreesRegressor 0.38 0.52 54.33
OrthogonalMatchingPursuitCV 0.37 0.52 54.39
Lasso 0.37 0.52 54.46
PassiveAggressiveRegressor 0.37 0.51 54.74
LarsCV 0.37 0.51 54.81
LassoLarsIC 0.36 0.51 54.83
SGDRegressor 0.36 0.51 54.85
Ridge 0.36 0.51 54.91
RidgeCV 0.36 0.51 54.91
BayesianRidge 0.36 0.51 54.94
LassoLarsCV 0.36 0.51 54.96
TransformedTargetRegressor 0.36 0.51 54.96
LinearRegression 0.36 0.51 54.96
Lars 0.36 0.50 55.09
HuberRegressor 0.36 0.50 55.24
RandomForestRegressor 0.35 0.50 55.47
AdaBoostRegressor 0.34 0.49 55.74
LGBMRegressor 0.34 0.49 55.93
HistGradientBoostingRegressor 0.34 0.49 56.05
PoissonRegressor 0.32 0.48 56.61
ElasticNet 0.30 0.46 57.49
KNeighborsRegressor 0.30 0.46 57.57
OrthogonalMatchingPursuit 0.29 0.45 57.87
BaggingRegressor 0.29 0.45 57.96
XGBRegressor 0.29 0.45 58.10
GradientBoostingRegressor 0.25 0.42 59.70
TweedieRegressor 0.24 0.42 59.81
GammaRegressor 0.22 0.40 60.61
LassoLars 0.20 0.38 61.39
RANSACRegressor 0.20 0.38 61.40
LinearSVR 0.12 0.32 64.66
ExtraTreeRegressor 0.00 0.23 68.73
NuSVR -0.07 0.18 71.06
SVR -0.10 0.15 72.04
DummyRegressor -0.30 -0.00 78.37
QuantileRegressor -0.35 -0.04 79.84
DecisionTreeRegressor -0.47 -0.14 83.42
GaussianProcessRegressor -0.77 -0.37 91.51
MLPRegressor -1.87 -1.22 116.51
KernelRidge -5.04 -3.67 169.06

Time Taken
Model
ExtraTreesRegressor 0.20
OrthogonalMatchingPursuitCV 0.02
Lasso 0.01
PassiveAggressiveRegressor 0.02
LarsCV 0.08
LassoLarsIC 0.02
SGDRegressor 0.01
Ridge 0.01
RidgeCV 0.01
BayesianRidge 0.03
LassoLarsCV 0.03
TransformedTargetRegressor 0.01
LinearRegression 0.01
Lars 0.04
HuberRegressor 0.02
RandomForestRegressor 0.28
AdaBoostRegressor 0.12
LGBMRegressor 0.14
HistGradientBoostingRegressor 0.22
PoissonRegressor 0.02
ElasticNet 0.01
KNeighborsRegressor 0.01
OrthogonalMatchingPursuit 0.01
BaggingRegressor 0.04
XGBRegressor 1.72
GradientBoostingRegressor 0.15
TweedieRegressor 0.01
GammaRegressor 0.01
LassoLars 0.01
RANSACRegressor 0.14
LinearSVR 0.01
ExtraTreeRegressor 0.01
NuSVR 0.03
SVR 0.02
DummyRegressor 0.01
QuantileRegressor 2.64
DecisionTreeRegressor 0.02
GaussianProcessRegressor 0.04
MLPRegressor 0.53
KernelRidge 0.03

Classification

# Load the Breast Cancer Dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the LazyClassifier
clf = LazyClassifier(verbose=0, ignore_warnings=True, custom_metric=None)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

# Print the Models
print(models

The LazyClassifier class is designed to simplify the process of training and evaluating machine learning models for classification tasks. To use the LazyClassifier class, you will need to pass it the training and test data, as well as the target variables for both sets. You can also pass additional optional parameters such as the verbose flag, which controls the level of output produced during training, and the custom_metric parameter, which allows you to specify a custom metric to use for evaluating the model.

Once you have created a LazyClassifier object and passed it the training and test data, you can call the fit() method to train a machine learning model. The fit() method will return a list of trained models and a list of predictions made by the models on the test set.

In the example you provided, the LazyClassifier object is trained using the fit() method and the trained models and predictions are stored in the models and predictions variables, respectively. The print() function is then used to print the models list, which contains the trained models.

You can use the models list to access the trained models and use them to make predictions on new data. You can also use the predictions list to evaluate the performance of the models on the test set.

Accuracy  Balanced Accuracy  ROC AUC  F1 Score  \
Model
BernoulliNB 0.98 0.98 0.98 0.98
PassiveAggressiveClassifier 0.98 0.98 0.98 0.98
SVC 0.98 0.98 0.98 0.98
Perceptron 0.97 0.97 0.97 0.97
AdaBoostClassifier 0.97 0.97 0.97 0.97
LogisticRegression 0.97 0.97 0.97 0.97
SGDClassifier 0.96 0.97 0.97 0.97
ExtraTreeClassifier 0.96 0.97 0.97 0.97
CalibratedClassifierCV 0.97 0.97 0.97 0.97
RandomForestClassifier 0.96 0.96 0.96 0.96
LGBMClassifier 0.96 0.96 0.96 0.96
GaussianNB 0.96 0.96 0.96 0.96
ExtraTreesClassifier 0.96 0.96 0.96 0.96
QuadraticDiscriminantAnalysis 0.96 0.96 0.96 0.96
LinearSVC 0.96 0.96 0.96 0.96
BaggingClassifier 0.96 0.95 0.95 0.96
XGBClassifier 0.96 0.95 0.95 0.96
LinearDiscriminantAnalysis 0.96 0.95 0.95 0.96
NearestCentroid 0.96 0.95 0.95 0.96
NuSVC 0.96 0.95 0.95 0.96
RidgeClassifier 0.96 0.95 0.95 0.96
RidgeClassifierCV 0.96 0.95 0.95 0.96
KNeighborsClassifier 0.95 0.94 0.94 0.95
DecisionTreeClassifier 0.95 0.94 0.94 0.95
LabelSpreading 0.94 0.93 0.93 0.94
LabelPropagation 0.94 0.93 0.93 0.94
DummyClassifier 0.62 0.50 0.50 0.48

Time Taken
Model
BernoulliNB 0.01
PassiveAggressiveClassifier 0.02
SVC 0.03
Perceptron 0.02
AdaBoostClassifier 0.18
LogisticRegression 0.06
SGDClassifier 0.02
ExtraTreeClassifier 0.01
CalibratedClassifierCV 0.05
RandomForestClassifier 0.27
LGBMClassifier 0.15
GaussianNB 0.01
ExtraTreesClassifier 0.15
QuadraticDiscriminantAnalysis 0.02
LinearSVC 0.05
BaggingClassifier 0.07
XGBClassifier 0.51
LinearDiscriminantAnalysis 0.04
NearestCentroid 0.02
NuSVC 0.04
RidgeClassifier 0.02
RidgeClassifierCV 0.03
KNeighborsClassifier 0.02
DecisionTreeClassifier 0.02
LabelSpreading 0.04
LabelPropagation 0.04
DummyClassifier 0.01

This will train a random forest classifier on the breast cancer dataset, and then evaluate the performance of the classifier on the test set. You can use the score() method of the LazyClassifier object to obtain the accuracy of the classifier on the test set.

You can also use Lazypredict to train and evaluate multiple classifiers at once by passing a list of classifiers to the LazyClassifier object. Lazypredict will then train and evaluate each classifier, and return a summary of the results for all the classifiers. This can be helpful for quickly comparing the performance of different classifiers on a given dataset.

Conclusion

In conclusion, Lazypredict is a powerful and easy-to-use library that simplifies the process of training and evaluating machine learning models. Its ability to train and evaluate multiple models at once and its flexible customization options make it a valuable tool for any machine learning practitioner. In this notebook, we demonstrated how to use Lazypredict to train and evaluate a variety of machine learning models for both classification and regression tasks. We hope that this notebook has provided a useful introduction to the capabilities of Lazypredict and has demonstrated how it can be used to improve the efficiency of machine learning workflows.

In addition to Lazypredict, you can also use the tools below:

  • TPOT
  • Auto-Sklearn
  • Auto-ViML
  • H2O AutoML
  • Auto-Keras
  • MLBox
  • Hyperopt Sklearn
  • AutoGluon

These are all tools that can be used for automated machine learning (AutoML). Automated machine learning is a process of automating the end-to-end process of building machine learning models, from data preprocessing and feature engineering to model training and evaluation. These tools aim to make the process of building machine learning models more efficient and accessible to users with little or no experience in machine learning.

Here is a brief overview of each of the tools you listed:

  • TPOT is a Python library that uses genetic programming to search for the best machine learning model and hyperparameter values for a given dataset. It can be used for both classification and regression tasks.
  • Auto-Sklearn is a Python library that uses a combination of Bayesian optimization and ensembling to search for the best machine learning model and hyperparameter values for a given dataset. It is designed to work with the scikit-learn library and can be used for both classification and regression tasks.
  • Auto-ViML is a Python library that uses a combination of feature engineering and model selection to build machine learning models for classification and regression tasks. It is designed to work with a variety of machine learning libraries, including scikit-learn and XGBoost.
  • H2O AutoML is a machine learning platform that uses a combination of feature engineering and model selection to build machine learning models for a variety of tasks. It can be used for both classification and regression tasks and supports distributed training for faster model training.
  • Auto-Keras is a Python library that uses neural architecture search to find the best neural network architecture for a given dataset. It can be used for both classification and regression tasks and is designed to be easy to use.
  • MLBox is a Python library that uses a combination of feature engineering and model selection to build machine learning models for a variety of tasks. It supports both supervised and unsupervised learning and can be used for both classification and regression tasks.
  • Hyperopt Sklearn is a Python library that uses Bayesian optimization to search for the best machine learning model and hyperparameter values for a given dataset. It is designed to work with the scikit-learn library and can be used for both classification and regression tasks.
  • AutoGluon is a Python library that uses a combination of feature engineering and model selection to build machine learning models for a variety of tasks. It supports both supervised and unsupervised learning and can be used for both classification and regression tasks.

🌸Thanks for reading this article. You can access the detailed codes of the project and other projects on my Github account or Kaggle account. Happy coding!

Please feel free to contact me if you need any further information.🌸

References

  1. https://www.linkedin.com/feed/update/urn:li:activity:7013347166484119552/
  2. https://pypi.org/project/lazypredict/

--

--

Emine Bozkus
Emine Bozkus

Written by Emine Bozkus

👩‍💻Data Scientist | 🤖 Researcher

No responses yet