PyCaret is an open-source, low-code machine learning library and end-to-end model management tool built-in Python for automating machine learning workflows. Its ease of use, simplicity, and ability to quickly and efficiently build and deploy end-to-end machine learning pipelines will amaze you.
PyCaret is an alternate low-code library that can replace hundreds of lines of code with few lines only. This makes the experiment cycle exponentially fast and efficient.
PyCaret is simple and easy to use. All the operations performed in PyCaret are sequentially stored in a Pipeline that is fully automated for deployment. Whether it’s imputing missing values, one-hot-encoding, transforming categorical data, feature engineering, or even hyperparameter tuning, PyCaret automates all of it. To learn more about PyCaret, watch this 1-minute video.
Features of PyCaret
Image by Author
Modules in PyCaret
PyCaret is a modular library arranged into modules and each module representing a machine learning use-case. As of the writing of this story, the following modules are supported:
Image by Author — Machine Learning use-case supported in PyCaret
* Time Series module is in making and will be available in the next major release.
Installing PyCaret
Installing PyCaret is very easy and takes only a few minutes. We strongly recommend using a virtual environment to avoid potential conflicts with other libraries.
PyCaret’s default installation is a slim version of pycaret that only installs hard dependencies listed here.
# install slim version (default)
pip install pycaret
# install the full version
pip install pycaret[full]
When you install the full version of pycaret, all the optional dependencies as listed here are also installed.
PyCaret by numbers — Image by author
Let’s get started
Before I show you how easy it is to do machine learning with PyCaret, let’s talk a little bit about the machine learning lifecycle at a high level:
Machine Learning Life Cycle — Image by Author (Read from left-to-right)
Business Problem — This is the first step of the machine learning workflow. It may take from few days to a few weeks to complete, depending on the use case and complexity of the problem. It is at this stage, data scientists meet with subject matter experts (SME’s) to gain an understanding of the problem, interview key stakeholders, collect information, and set the overall expectations of the project.
Data Sourcing & ETL — Once the problem understanding is achieved, it then comes to using the information gained during interviews to source the data from the enterprise database.
Exploratory Data Analysis (EDA) — Modeling hasn’t started yet. EDA is where you analyze the raw data. Your goal is to explore the data and assess the quality of the data, missing values, feature distribution, correlation, etc.
Data Preparation — Now it’s time to prepare the data model training. This includes things like dividing data into a train and test set, imputing missing values, one-hot-encoding, target encoding, feature engineering, feature selection, etc.
Model Training & Selection — This is the step everyone is excited about. This involves training a bunch of models, tuning hyperparameters, model ensembling, evaluating performance metrics, model analysis such as AUC, Confusion Matrix, Residuals, etc, and finally selecting one best model to be deployed in production for business use.
Deployment & Monitoring — This is the final step which is mostly about MLOps. This includes things like packaging your final model, creating a docker image, writing the scoring script, and then making it all work together, and finally publish it as an API that can be used to obtain predictions on the new data coming through the pipeline.
Image by Author
There is no limit to what you can achieve using this lightweight workflow automation library in Python. If you find this useful, please do not forget to give us ⭐️ on our GitHub repository.
Resource: Towards Data Science - Moez Ali
The Tech Platform
Comments