MLflow: The Cure To Your Machine Learning Headaches

Streamline your machine learning workflow with MLflow: a comprehensive guide.
continuous integration
best practices


Are you tired of keeping track of your machine learning experiments in scattered log files 😫 ? Fear not, for I have come to introduce you to MLflow ! With this open-source platform, you can finally say goodbye to the days of lost models and broken code.

It not only simplifies the management of your machine learning projects but also provides a unified interface for tracking your experiments, packaging your code, and deploying your models 😎

In this article, we will cover the fundamentals of MLflow, highlight its essential features, discuss its limitations, and, most importantly, explain why MLflow is crucial for collaborative machine learning projects 🚀

Before starting, if you’re interested in deploying MLflow on Google Cloud Platform, feel free to read my article titled: Effortless MLflow Deployment With Terraform On GCP.

A GitHub Template for Machine Learning Projects

I’m excited to share that I have built a MLflow template on GitHub that you can use to create a new repository with all the necessary components for a streamlined machine learning workflow. This template includes the following awesome features:

  • Sample Pytorch script that training on MNIST and logs metrics and parameters to MLflow
  • Pre-built Continuous Integration (CI) with guidelines for customization
  • Pre-commit configuration
  • Advise on using the GitHub Flow workflow (to understand how to use it, checkout my last article)

With this template, you can quickly set up a new machine learning project with all the necessary tools and configurations, saving you time and effort. Simply clone the template repository and customize it as you see fit. I will refer to this template throughout the article to show you how to use MLflow in practice. You can find the template through this link.

MLflow Demystified

MLflow is an open-source platform for Machine Learning that covers the entire ML-model lifecycle, from development to production and retirement. It is a central place to collaborate and manage model lifecycle, providing data science teams with an efficient and organized workflow.

MLflow is composed of the following main modules:

  • Tracking: it tracks experiments by logging parameters, metrics, and artifacts in a centralized location.
  • Packaging: allows you to package your code and models in a reproducible way.
  • User Interface: you can visualize your experiments and compare past runs with new ones.
  • Model Registry: provides a central registry for storing and managing your models in a structured way, including versioning, staging, and deployment.
  • Model Deployment: allows you to deploy your models in production.

Continue reading, as we will provide a detailed explanation of each module and the reasons why you should use them 🤓

Advantages of MLflow

In my opinion, the main advantages of MLflow are: 👍

  • Compatibility: it is designed to work with a wide range of machine learning libraries, including TensorFlow, PyTorch, scikit-learn and more
  • Flexibility: You can use MLflow with a variety of deployment options, including on-premise, cloud, and hybrid environments as well as integrating in other tools such as Kubeflow
  • A large community of users and contributors
  • Easy to use

MLflow is the most widely used framework in the ML community, particularly in the category of experiment tracking tools. It is used by companies such as Databricks, Facebook, and many more. This is a very important aspect when deciding to learn a new framework as it can be cumbersome to spend time learning a tool that will fall out of favor in the future 😭

Overall, data science teams can streamline their workflow, increase productivity, and reduce the time and effort required to develop machine learning models.

MLflow Drawbacks

MLflow is a great tool, but it is not perfect. After using MLflow on several projects, I have identified two main issues:

  1. Deployment: The deploying features are very limited and are suitable for small projects only. If you want to deploy your models in production with medium to high availability, you will need to use other tools such as BentoML or Nvidia Triton.
  2. Pipeline Orchestration: MLflow only supports linear pipelines and does not allow the building of Directed Acyclic Graphs (DAGs) workflows. This means that you cannot have a pipeline of models that are executed in parallel. This is a big drawback for data science teams that want to build complex pipelines. Fortunately, MLflow can be used in conjunction with other complementary tools. The tracking and repoducibility features are very useful in my opinion and it is a huge step up to use MLflow collaboratively against not using any tool.

MLflow Basics

In this section, we will go through the most important features from the MLflow documentation. We will go through each of these features in detail and provide examples on:

  • using the Tracking API with PyTorch to log metrics and artifacts
  • packaging code as an MLflow Project for reproducibility
  • logging an MLflow Model for versioning and sharing
  • setting up a Tracking Server with a Postgres backend for collaboration with your team
  • leveraging the MLflow Model Registry for model deployment and governance

By the end of this article, you will have a solid understanding of how to use MLflow to streamline your machine learning workflow and collaborate effectively with your team 🤝. So Stay tuned !


First, install the virtualenvwrapper package:

sudo apt-get install python3.10 python3.10-venv python3-venv
sudo apt-get install python3-virtualenv

pip install virtualenvwrapper
python3.11 -m pip install virtualenvwrapper

echo "export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source ~/.local/bin/" >> ~/.bashrc

Next, create a new Python environment:

mkvirtualenv myenv -p python3.10

Install the dependencies:

python -m pip install --upgrade pip
pip install -r requirements.txt

Upload Experiment’s Logs

Follow along by cloning the MLflow template repository here.

MLflow Quickstart

In the mymodule/ file, you can find a simple example of using the MLflow Tracking API. Here are the MLflow functions to log parameters, metrics and artifacts:

from mlflow import log_metric, log_param, log_artifacts

# Log a paramater(like batch_size, epochs, learning_rate, etc.)
log_param("param_name", parameter)

# Log a metric (like accuracy, loss, etc.)
log_metric("metric_name", metric)

# Log an artifact (like a trained model or a visualization)

MLflow with PyTorch

In the mymodule/ file, you can find how to upload logs to MLflow inside a PyTorch loop.

Add a simple line to log the train_loss.

import mlflow

nb_batches = int(size / dataloader.batch_size)
for batch, (input_data, ground_truth) in enumerate(dataloader):
    input_data =
    ground_truth =

    pred = model(input_data)
    loss = loss_fn(pred, ground_truth)




    loss = loss.item()

    # Add this line to log the train_loss
    mlflow.log_metric('train_loss', loss, step=batch)

In the next section, we will see how to visualize the results in the MLflow UI. Run python to train an MNIST model and upload the logs to the MLflow server.

MLflow User Interface (UI)

Starting the MLflow UI is as simple as:

mlflow ui

This command starts a MLflow Tracking Server on your local machine. You can then access the UI at http://localhost:5000.

As you can see in the screenshot below, the UI is very intuitive and easy to use:

At the top left corner, you can see the name of the experiment. In the center of the page, you can find a listing of all the experiments that you have run. Feel free to click on one to explore the logs.

MLflow Project: Reproducibility Is All You Need

MLflow Project is an awesome module that helps you share your whole machine learning project with others without any headache. You can bundle up your code, dependencies, and parameters in a YAML file and create a source distribution or Docker container that others can easily run on their computer.

It’s super handy because MLflow runs your project in a safe and isolated environment (Conda, Pip, or Docker) so everyone uses the same environment. This makes it simple to track and reproduce the exact settings and parameters that were used to run the project, and the results and outputs generated by the code. It’s great for comparing different versions of your model or experiment, and reproducing the results on different computers.

Packaging your Project

First, let’s install pyenv:

curl -L | bash

Package your project with a MLproject file:

name: MLflow Template
python_env: environment.yaml

      epochs: {type: int, default: 1}
      batch_size: {type: int, default: 64}
      train_on_first_n: {type: int, default: 0}
    command: "python --epochs {epochs} --batch_size {batch_size} --train_on_first_n {train_on_first_n}"

As you can see, we can pass parameters to the script that matches the argparse arguments defined in the file.

We will use a environment.yaml file to specify the virtualenv dependencies:

python: "3.10"
# Dependencies required to build packages. This field is optional.
  - pip
  - setuptools
  - wheel==0.37.1

  - mlflow
  - pytorch

Running MLflow Project

To run an MLflow Project, you need to specify the URI of the project. The URI can be a local path. For example, to run the project in the current directory, you can run:

mlflow run . --experiment-name <your-experiment-name>

You can pass command line arguments this way:

mlflow run . --experiment-name <your-experiment-name> -P epochs=10 -P batch_size=128 -P train_on_first_n=1048

Now prepare to have your mind blown 🤯. You can also run the project from a GitHub repository URI:

mlflow run \
--experiment-name <your-experiment-name> \
-P epochs=10 \
-P batch_size=128 \
-P train_on_first_n=1048

That’s right ! You can run a project from a GitHub repository URL without even having to clone or create the project environment. This opens up a whole new world of possibilities for reproducibility and collaboration.

Imagine you are working on a project with a team of data scientists. You can now share your project URL with your team members and they can run it ! They can simply change the input data and parameters and it will work out of the box. Plus, they can share the result of the experiment with you using the MLflow UI 😍

MLflow Models

The MLflow Models module is also an awesome addition to the framework. With this handy tool, you can easily track your model’s weights and architecture by logging them onto the MLflow Server 📈 It’s a great way to keep your machine learning models in check !

Saving a Model

Logging a PyTorch model can be done quite easily. By including a requirements.txt file that lists the necessary dependencies, the MLflow server can automatically install them when the model is loaded

    pip_requirements=['torch', '-r requirements.txt'],

Loading a Model

For loading a model, go to the Run tab in the MLflow UI, and locate the run_id. You can find the run_id in the image below:

Now, you can effortlessly load the model using:


You can also load the model from its name and version. If you used the registered_model_name argument when logging the model, go to the MLflow UI, go to the Models tab and copy the name and version as in the image below:

Then, you can load the model with the name and version:


Stay tuned 👀, as we will delve deeper into the MLflow Model Registry in the next section.

Tracking Server With Postgres Backend

Before being able to use the MLflow Model Registry, you need to run a MLflow tracking server with a database backend.

Here is how to do it in 8 simple steps 🔢

  1. Install postgresql:
sudo apt-get install postgresql postgresql-contrib postgresql-server-dev-all
  1. Run psql with the default postgres user:
sudo -u postgres psql
  1. Create a database and user:
  1. Install psycopg2 library to connect to Postgres from Python:
sudo apt install gcc 
pip install psycopg2
  1. Create a mlruns directory to store the MLflow tracking data:
mkdir ~/mlruns
  1. Run local MLflow tracking server with Postgres backend:
mlflow server --backend-store-uri postgresql://mlflow:mlflow@localhost/mlflow --default-artifact-root file:/home/boxy/mlruns -h -p 5000
  1. Add the following lines to ~/.bashrc to set the MLflow tracking URI:
echo "export MLFLOW_TRACKING_URI=" >> ~/.bashrc
  1. Run the following command to run the MLflow project:
mlflow run . --experiment-name mnist_pytorch

Open your Mlflow tracking server in your browser and you should see the run logged in the Runs tab.

Congrats!🎉 You now have MLflow running on a database backend! Next up, let’s explore the wonders of the MLflow Model Registry.

Model Registry: Tags & Versions

The MLflow Model Registry is a powerful tool that enables you to manage and organize your models by registering and versioning them. Registering a model involves creating a permanent record of a trained machine learning model in a central repository.

You can use the Model Registry to:

  • Register models
  • Transition models through stages of development and deployment
  • Manage model versions
  • Assign model versions to stages
  • Query models and model versions
  • Manage model lineage

Registering a Model

There is multiple ways to register a model:

  • From the MLflow UI (see this link)
  • From the MLflow API

In this example, we are using the MLflow API to register a model:

    pip_requirements=['torch', '-r requirements.txt'],

As you can see, the registered_model_name argument is used to tag the model with a name. Each time we log a model with the same registered_model_name, a new version of the model is created.

Model Stages

In MLflow, Model Stages are different versions of a registered model that have been transitioned from one stage to another, such as from Staging to Production. Stages allow you to organize your different models and keep track of their evolution.

A model can be in one of the following stages:

  • None: The model is not registered in the registry.
  • Staging: The model is being tested.
  • Production: The model is in production.
  • Archived: The model is archived.

You can change the stage of a model from the UI following this link or using the MLflow API as follows:

client = MlflowClient()
    name="mnist", version=1, stage="Production"

Fetching a Model

Now this is the most exciting part! You can fetch a model from the registry using the MLflow API. In a production environment, you will load the Production version of the model. You do not have to worry to change the code or weights path. As soon as you change the stage of the model, MLflow will automatically load the new version of the model.

Here’s how to fetch a model from the registry with a model version:

model_name = "mnist"
model_version = 1
model = mlflow.pytorch.load_model(model_uri=f"models:/{model_name}/{model_version}")

We can also fetch a model from the registry with a model stage:

model_name = "mnist"
model_stage = "Production"
model = mlflow.pytorch.load_model(model_uri=f"models:/{model_name}/{model_stage}")


Congrats ! 🎉

You’ve learned the basics of MLflow and how to use it to track your machine learning experiments and models. Whether you’re a data scientist, machine learning engineer, or researcher, MLflow can help you streamline your workflow, collaborate with your team, and gain valuable insights from your experiments. So why not give it a try and take your machine learning projects to the next level ?

But wait, there’s more ! 🤩 You might be wondering how to collaborate with your team if MLflow is running on your local machine. No worries, I’ve got you covered 👍 In the next article, we’ll explore how to deploy MLflow on the Google Cloud Platform with Terraform and make it accessible to your entire organization.

Stay in touch

I hope you enjoyed this article as much as I enjoyed writing it !
If so, feel free to support my work by interacting with my content on LinkedIn 👀
You can also subscribe to be notified of the latest articles I publish 😌