Integrating Evidently AI with MLflow for ML Model Monitoring

Introduction

In the world of machine learning, monitoring model performance is crucial for maintaining accuracy and reliability. Today, we will explore how to integrate Evidently AI with MLflow to enhance our monitoring capabilities. This powerful combination allows us to visualize our machine learning models' behavior like never before. Let’s dive into the steps required to accomplish this integration.

Installation

First, we need to install the required libraries. Make sure you have Evidently AI, MLflow, and a few other packages installed. You can use the following command in your terminal:

pip install evidently mlflow pandas scikit-learn

If you have them already installed, you will see messages indicating that the requirements are satisfied.

Importing Libraries

Next, we will import the necessary libraries for our project:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from evidently import ColumnMapping
from evidently.report import Report
import mlflow

Loading and Preparing the Dataset

For this tutorial, we will use a student performance dataset. You can find the dataset link in the description below, which links to Kaggle.

Load the dataset.
Use Label Encoder to encode the result feature.
Set up the input and output features for training and testing.
Encode categorical variables in the input features using the get_dummies method.
Split the dataset into training and testing sets.

The code to accomplish this is as follows:

data = pd.read_csv('student_performance.csv')  # Replace with your dataset
label_encoder = LabelEncoder()
data['target'] = label_encoder.fit_transform(data['target_column'])  # Replace 'target_column' with actual column
X = data.drop(['target'], axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Training the Logistic Regression Model

Now, we will train a logistic regression model and generate a performance report using Evidently AI:

model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict_proba(X_train)

We'll create a column to store the prediction probabilities, which Evidently AI uses to analyze and generate reports about our model's performance.

Next, we save our target variable in both training and testing datasets since Evidently AI looks for a column named "target".

X_train['target'] = y_train  
X_test['target'] = y_test

Then, we can pass our reference data and current data to Evidently AI and save the report in JSON format locally.

report = Report(data=X_train, reference_data=X_test)
report.save('performance_report.json')

Extracting Accuracy Metrics

To help us understand our model's accuracy, we need to extract metrics from the JSON report.

import json

with open('performance_report.json') as f:
    report_data = json.load(f)
accuracy = report_data['metrics']['accuracy']  # Access the accuracy metric

Logging Model and Metrics in MLflow

To keep track of our model's performance over time, we will log our model and the accuracy metrics into MLflow:

mlflow.set_experiment("monitoring_with_evidently_AI")
mlflow.log_param("model_name", "Logistic Regression")
mlflow.log_metric("accuracy", accuracy)

Finally, run the MLflow UI with the command:

mlflow ui

You can now open the UI in your browser and see the experiment logged.

Conclusion

Congratulations! You have now integrated Evidently AI metrics with MLflow to monitor your machine learning models effectively. This integration provides you with a robust mechanism to track and visualize your model's performance over time.

Thank you for reading! If you found this article helpful, be sure to share it with your colleagues and subscribe for more machine learning tutorials. Check out the GitHub repository and Medium article links in the description for all the code and datasets used in this article.

Keywords

Evidently AI
MLflow
Machine Learning
Model Monitoring
Logistic Regression
Dataset
Accuracy Metrics
Performance Report

FAQ

Q1: What is Evidently AI?
A1: Evidently AI is an open-source library that provides tools to create performance reports for machine learning models.

Q2: What is MLflow?
A2: MLflow is an open-source platform for managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.

Q3: How do I install the required libraries?
A3: You can install the required libraries by running pip install evidently mlflow pandas scikit-learn in your terminal.

Q4: How do I visualize the MLflow UI?
A4: You can visualize the MLflow UI by running the command mlflow ui and accessing it through your web browser.

Q5: What dataset is used in this integration?
A5: The tutorial uses a student performance dataset, which you can find linked in the description.