Efficient ML Monitoring with Evidently AI: Step-by-Step Guide for Report Generation

Introduction

In this article, we will explore Evidently AI, a robust tool aimed at enhancing the monitoring of machine learning (ML) models. Evidently AI enables users to monitor model performance, detect potential issues, and generate comprehensive reports. We will demonstrate how to leverage Evidently AI using the California Housing dataset.

Getting Started

Setup Environment

To kick things off, ensure that your terminal or Jupyter Lab is open and run the following command to install the required libraries. We will be using a specific version of Evidently AI along with other essential libraries.

pip install evidently

Once the necessary libraries are installed, we can import the required tools. Here’s a list of the libraries we will import:

import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
from evidently import ColumnMapping
from evidently.report import Report

These imports will help us with data manipulation, report generation, and running various analyses.

Loading The Dataset

Next, we will load the California Housing dataset, which is readily available in Scikit-learn. We will also add a prediction column with some noise to make our analysis a bit more interesting. The following code snippet demonstrates how to accomplish this:

california_housing = fetch_california_housing()
data = pd.DataFrame(data=california_housing.data, columns=california_housing.feature_names)
data['target'] = california_housing.target + np.random.normal(0, 0.1, size=len(california_housing.target))

We now have our dataset ready with a target column named 'target' (renamed for compatibility with Evidently AI).

Splitting Data

To analyze possible data drift, we need to split our dataset into reference and current datasets. The reference dataset contains historical data, while the current dataset includes the most recent observations.

reference_data = data.sample(frac=0.5, random_state=42)
current_data = data.drop(reference_data.index)

Generating a Data Drift Report

With our data prepared, we can now generate a data drift report to identify any shifts between our reference and current datasets. Running the following command will provide us with the necessary insights:

column_mapping = ColumnMapping()
report = Report(metrics=[...])  # Specify metrics here
report.run(reference_data=reference_data, current_data=current_data, column_mapping=column_mapping)
report.show()

This command generates a report within Jupyter Lab, allowing us to observe various visualizations, including data distribution and drift metrics for each column in an interactive format.

Custom Reports for Specific Columns

Sometimes, you may wish to generate reports for specific columns. For instance, let’s generate a report for the 'total_rooms' column:

report.run(reference_data=reference_data[['total_rooms']], current_data=current_data[['total_rooms']])
report.show()

This will provide a detailed overview and visual representation of the data drift associated with this particular column.

Combining Metrics

Evidently AI allows you to customize your reporting. You can combine multiple columns and generate a report by doing the following:

report = Report(metrics=[...])  # Add the specific metrics for multiple columns
report.run(reference_data=reference_data, current_data=current_data)
report.show()

Exporting Reports

After generating insightful reports, you may want to export them for further analysis or for sharing with your team:

report.as_dict()  # Get report as a dictionary.

This allows you to save the report in various formats, such as JSON, for future reference.

Conclusion

By utilizing Evidently AI, ML monitoring becomes significantly more manageable. Following the steps outlined in this guide enables you to generate detailed insightful reports and to continuously monitor and improve model performance.

If you found this tutorial helpful, please give it a thumbs up, share it with your peers, and subscribe for more valuable content. Check out the GitHub link in the description for all the code and examples provided.

Keywords

Evidently AI
ML Monitoring
California Housing Dataset
Data Drift
Report Generation
Model Performance

FAQ

Q1: What is Evidently AI? A1: Evidently AI is a tool designed for monitoring machine learning models, allowing users to detect issues and generate reports.

Q2: How do I install Evidently AI? A2: You can install Evidently AI using pip with the command pip install evidently.

Q3: What dataset did we use for the demonstration? A3: The California Housing dataset was used for the demonstration of Evidently AI functionalities.

Q4: Can I generate reports for specific columns? A4: Yes, you can generate reports for specific columns using Evidently AI by specifying the column of interest while generating the report.

Q5: How can I export the generated reports? A5: The generated reports can be exported as a dictionary or in JSON format for further analysis or sharing.