MLOps Basic Open-Source Tool Series #1: MLflow for Experiment Tracking and Model Management

New series discussing MLOps Open-Source tools for your Machine Learning needs. The first edition would discuss on the Experiment Tracking and Model management.

Mar 16, 2024

This newsletter is part of a series introducing open-source tools used in Machine Learning Operations (MLOps). Each series will introduce new tools for different parts of the process. We would combine everything at the end of the series to make a cohesive MLOps project.

Many machine learning models have recently been stuck in storage and not deployed. It’s very sad, as the best machine learning model is the one that makes it into production.

Although, it’s more than just deploying a model in the modern era.

As the data science and machine learning fields become more mature, providing business values is not a matter of developing and making it into production; it’s also about maintaining and monitoring the model. Ensuring the model keeps giving value to the business.

This is where the MLOps field emerges. MLOps stands for Machine Learning Operations, consisting of collecting techniques and tools for deploying ML models in production.

The field contains the combination of:

DevOps,
Machine Learning.

MLOps is an important field as it provides continuous values and reduces technical debt within the machine learning life cycle. The standard process is shown in the image below.

This graph will guide us through important components for building the MLOps structure. This graph would become our basis for building the overall structure as well.

In this series, we will go through each component in this table.

As for the current newsletter, we would go through the Experiment Tracking and Model Management component using the open-source tool MLFlow.

So, let’s learn about it!

MLFlow

MLFlow is an open-source tool developed to manage the machine learning process. It’s designed to manage the ML lifecycle components while providing easy ways to use them.

It’s usually used for experimental tracking, but it can also used for model registry and metadata. They are also often used for reproducibility and deployment, which helps us maintain the cycle.

Recently, they also provided ways for users to maintain the LLM application (LLMOps), but we would focus on the classic machine learning model.

Within MLFlow, there are several core foundation functionalities:

Tracking: MLflow Tracking offers an API and UI for centralized logging of ML process details, enabling easy comparison and insight into model evolution.
Model Registry: The Model Registry provides a centralized system for managing ML model versions, states, and productionization, complete with store, APIs, and UI for full lifecycle collaboration.
Recipes (Previously Pipeline): Recipes guide ML project structuring, focusing on achieving functional outcomes optimized for real-world deployment.
Projects: MLflow Projects standardize ML code packaging into executable formats, using descriptors to define dependencies and execution within directories or Git repositories.

As we will focus on the Experiment tracking and the Model Registry part, let’s start with them first.

MLFlow for Experiment Tracking

Experiment tracking is logging the experiment process, including the code, variables, parameters, and anything we thought important to track.

Why do we want to track our machine-learning experiments? There are many reasons why we want to do it. Tracking machine learning experiments in MLOps would ensure reproducibility, help model comparison and selection, and maintain auditability.

In this tutorial, we will try to track locally our experiment with a sample dataset. We would try to track the parameters and model used for the training.

To start the tutorial, we would install the MLFlow Python package. I assume you use Virtual Environment throughout the process, as it’s the best practice.

pip install mlflow

After the installation, let us set up the local server to maintain all the experiment data from the MLFlow. The cloud server is possible, but we would start with the local. The installation should come with the CLI tools so we can start the local server with the following code in the CLI.

mlflow ui

If everything works properly, visit the address that serves the MLFlow server (by default, it is http://127.0.0.1:5000), and you should see the UI like the image above.

Before we go further, you should understand two terms that are used in the MLFlow:

Runs: MLflow Tracking centers on "runs," executions of data science code (like a Python script), capturing metadata (metrics, parameters, times) and artifacts (model weights, images, etc.) from each run.
Experiments: An experiment groups together the runs for a specific task. It can be created using the CLI, API, or UI, with capabilities to search for experiments via the API and UI.

Basically, we can set up a group Experiment that contains all the Runs, where each run could be anything that we track.

With that in mind, let’s set an experiment to understand how MLFlow could track our experiments. For the dataset example, I would use the Titanic sample data with some columns pre-selected.

import seaborn as sns
import pandas as pd

df = sns.load_dataset('titanic')
df = df[['survived', 'pclass', 'age', 'sibsp', 'parch', 'adult_male']].dropna().reset_index(drop = True)
df.head()

We would track 3 classifier models for this experiment: Logistic Regression and K-NN from Scikit-Learn and the XGBoost Classifier. Each model would use different hyperparameters with the same dataset. We would also track the metrics, which are the Accuracy and F1 score.

Let’s split the data initially. We would use the following code to split the dataset.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop('survived', axis = 1), df['survived'], test_size=0.2, random_state=42)

Then, let’s set the Experiments for MLFlow to track. First, we need to set the tracking URI, which points to our local server.

import mlflow
mlflow.set_tracking_uri(uri="http://127.0.0.1:5000")

Next, we set up the Experiment group with the following code.

EXPERIMENT_NAME = "Titanic-Survived-Classifier-Experiment"
mlflow.set_experiment(EXPERIMENT_NAME)

We don't need to create it again if you want to return to this experiment. Instead, we only need the experiment ID identifier to get the intended experiment and pass the ID during the tracking process.

current_experiment=dict(mlflow.get_experiment_by_name(EXPERIMENT_NAME))
experiment_id=current_experiment['experiment_id']

For the experiment, we would set it up with the current setup.

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier

models_to_experiment = {
        "LR": LogisticRegression,
        "K-NN": KNeighborsClassifier,
        "XGB Classifier": XGBClassifier
    }

params_to_experiment = {
    'LR': {
        'solver': 'lbfgs',
        'C': 1,
        'max_iter': 100
    },
    'K-NN': {
        'n_neighbors': 5,
        'weights': 'uniform',
        'algorithm': 'auto'
    },
    'XGB Classifier': {
        'max_depth': 3,
        'learning_rate': 0.1,
        'n_estimators': 100,
        'booster': 'gbtree',
        'gamma': 0,
        'use_label_encoder':False, 
        'eval_metric':'logloss'
    }
}

I set up the model and hyperparameters that we would use in the separate dictionary. We would experiment with each model and the hyperparameter to see if MLFlow could track them.

The last step is to run the experiment and let MLFlow track them. To do that, we can use the following code.

from sklearn.metrics import accuracy_score, f1_score
from mlflow.models import infer_signature

for model_name in models_to_experiment.keys():
    RUN_NAME = f'Titanic Classifier Experiment {model_name}'
    with mlflow.start_run(experiment_id=experiment_id, run_name=RUN_NAME):
 
        params = params_to_experiment[model_name]
        model = models_to_experiment[model_name](**params)
        
        model.fit(X_train, y_train)  # Train model
        predictions = model.predict(X_test)  # Predictions
    
        # Calculate metrics
        accuracy = accuracy_score(y_test, predictions)
        f1 = f1_score(y_test, predictions, average='weighted')  
        
        # Log the hyperparameters
        mlflow.log_params(params)
    
        # Log the loss metric
        mlflow.log_metric(f"{model_name}_accuracy", accuracy)
        mlflow.log_metric(f"{model_name}_f1", f1)
    
        # Set a tag that we can use to remind ourselves what this run was for
        mlflow.set_tag("Training Info", f"{model_name} model for Titanic")
    
        # Infer the model signature
        signature = infer_signature(X_train, model.predict(X_train))

        #log the model
        if model_name == "XGBoost":
            model_info = mlflow.xgboost.log_model(
            xgb_model=model,
            artifact_path=f"titanic_{model_name}_model",
            signature=signature,
            input_example=X_train,
            registered_model_name=f"tracking-titanic-{model_name}",
        )
        else:
            model_info = mlflow.sklearn.log_model(
            sk_model=model,
            artifact_path=f"titanic_{model_name}_model",
            signature=signature,
            input_example=X_train,
            registered_model_name=f"tracking-titanic-{model_name}",
        )

        #Log the dataset
        training_df = pd.concat([X_train, y_train], axis =1).reset_index(drop = True)
        training_df["ModelOutput"] = model.predict(X_train)
        dataset = mlflow.data.from_pandas(training_df, targets="survived", predictions="ModelOutput", name = f"data_{model_name}")
        mlflow.log_input(dataset, context="training data")

        mlflow.end_run()

If you successfully run the above code, everything you log during the process above will be stored in the MLFlow. Take a look at your MLFlow UI to see the result.

Let’s break down a little bit for the function we used above. For each model experiment, we execute them in different Run which are stated in this code:

    RUN_NAME = f'Titanic Classifier Experiment {model_name}'
    with mlflow.start_run(experiment_id=experiment_id, run_name=RUN_NAME)

The RUN_NAME parameter would control the Run Name shown in the UI. Choose an appropriate name that you might be able to remember. Then, there are a few functions we used to track:

mlflow.log_params: Track the hyperparameters for the model we experiment upon,
mlflow.log_metric: Track the model metrics,
mlflow.set_tag: Input for providing a tag for the run information,
infer_signature: Infer an MLflow model signature, which is the data input and output information,
mlflow.xgboost.log_model and mlflow.sklearn.log_model: Track the model information and artifact. See this list for the list of in-built framework we can track with MLFlow.
mlflow.log_input: Track the dataset used for the training.

Thank you for reading Non-Brand Data. This post is public so feel free to share it.

For each run, try to click the run name, and you can see all the information you set to log. For example, let’s see the XGB Classifier we experimented with before.

You can also see the Hyperparameter details and the Metrics.

It is also possible to see the Model Artifacts.

You can explore the UI by yourself and try to feel it. Try to experiment with the log and see the documentation as well.

When you track the model with the code above, you will also automatically register the model. You can find them in the Models section if you feel it wasn't right.

Conclusion

This is the first of the MLOps basic series. We have already learned to use the MLFlow for experiment tracking and model management.

We will continue to learn the MLOps basics in the next newsletter and combine everything in the end. So, stay tuned!

Non-Brand Data

MLOps Basic Open-Source Tool Series #1: MLflow for Experiment Tracking and Model Management

New series discussing MLOps Open-Source tools for your Machine Learning needs. The first edition would discuss on the Experiment Tracking and Model management.

MLFlow

MLFlow for Experiment Tracking

Conclusion

Discussion about this post