Integrating Evidently AI with MLflow for ML Model Monitoring
Science & Technology
Introduction
In the world of machine learning, monitoring model performance is crucial for maintaining accuracy and reliability. Today, we will explore how to integrate Evidently AI with MLflow to enhance our monitoring capabilities. This powerful combination allows us to visualize our machine learning models' behavior like never before. Let’s dive into the steps required to accomplish this integration.
Installation
First, we need to install the required libraries. Make sure you have Evidently AI, MLflow, and a few other packages installed. You can use the following command in your terminal:
pip install evidently mlflow pandas scikit-learn
If you have them already installed, you will see messages indicating that the requirements are satisfied.
Importing Libraries
Next, we will import the necessary libraries for our project:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from evidently import ColumnMapping
from evidently.report import Report
import mlflow
Loading and Preparing the Dataset
For this tutorial, we will use a student performance dataset. You can find the dataset link in the description below, which links to Kaggle.
- Load the dataset.
- Use Label Encoder to encode the result feature.
- Set up the input and output features for training and testing.
- Encode categorical variables in the input features using the
get_dummies
method. - Split the dataset into training and testing sets.
The code to accomplish this is as follows:
data = pd.read_csv('student_performance.csv') # Replace with your dataset
label_encoder = LabelEncoder()
data['target'] = label_encoder.fit_transform(data['target_column']) # Replace 'target_column' with actual column
X = data.drop(['target'], axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Training the Logistic Regression Model
Now, we will train a logistic regression model and generate a performance report using Evidently AI:
model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict_proba(X_train)
We'll create a column to store the prediction probabilities, which Evidently AI uses to analyze and generate reports about our model's performance.
Next, we save our target variable in both training and testing datasets since Evidently AI looks for a column named "target".
X_train['target'] = y_train
X_test['target'] = y_test
Then, we can pass our reference data and current data to Evidently AI and save the report in JSON format locally.
report = Report(data=X_train, reference_data=X_test)
report.save('performance_report.json')
Extracting Accuracy Metrics
To help us understand our model's accuracy, we need to extract metrics from the JSON report.
import json
with open('performance_report.json') as f:
report_data = json.load(f)
accuracy = report_data['metrics']['accuracy'] # Access the accuracy metric
Logging Model and Metrics in MLflow
To keep track of our model's performance over time, we will log our model and the accuracy metrics into MLflow:
mlflow.set_experiment("monitoring_with_evidently_AI")
mlflow.log_param("model_name", "Logistic Regression")
mlflow.log_metric("accuracy", accuracy)
Finally, run the MLflow UI with the command:
mlflow ui
You can now open the UI in your browser and see the experiment logged.
Conclusion
Congratulations! You have now integrated Evidently AI metrics with MLflow to monitor your machine learning models effectively. This integration provides you with a robust mechanism to track and visualize your model's performance over time.
Thank you for reading! If you found this article helpful, be sure to share it with your colleagues and subscribe for more machine learning tutorials. Check out the GitHub repository and Medium article links in the description for all the code and datasets used in this article.
Keywords
- Evidently AI
- MLflow
- Machine Learning
- Model Monitoring
- Logistic Regression
- Dataset
- Accuracy Metrics
- Performance Report
FAQ
Q1: What is Evidently AI?
A1: Evidently AI is an open-source library that provides tools to create performance reports for machine learning models.
Q2: What is MLflow?
A2: MLflow is an open-source platform for managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.
Q3: How do I install the required libraries?
A3: You can install the required libraries by running pip install evidently mlflow pandas scikit-learn
in your terminal.
Q4: How do I visualize the MLflow UI?
A4: You can visualize the MLflow UI by running the command mlflow ui
and accessing it through your web browser.
Q5: What dataset is used in this integration?
A5: The tutorial uses a student performance dataset, which you can find linked in the description.