Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations MLflow: An Open Platform to Simplify the Machine Learning Lifecycle

MLflow: An Open Platform to Simplify the Machine Learning Lifecycle



Corey Zumar offers an overview of MLflow – a new open source platform to simplify the machine learning lifecycle from Databricks. MLflow provides APIs for tracking experiment runs between multiple users within a reproducible environment and for managing the deployment of models to production. MLflow is designed to be an open, modular platform.


Corey Zumar is a software engineer at Databricks, where he’s working on machine learning infrastructure and APIs for the machine learning lifecycle, including model management and production deployment. He is also an active developer of MLflow.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Zumar: I'm Corey [Zumar], I'm a software engineer at Databricks and today I'll be talking about MLflow, a platform for the complete machine learning life cycle. MLflow is an open source project that originally got its start at Databricks and it's developing a large contributor community.

To start off, let me provide an outline of the topics that I'll be covering over the next 40 minutes. I'll begin with an overview of some of the critical challenges associated with developing machine learning applications. After that, I'll introduce MLflow specifically addressing how MLflow is designed to tackle each of these challenges. You'll get an overview of the three critical components of MLflow, tracking, projects, and models, as well as a compelling demo for each component. Finally, you'll learn about the ongoing roadmap for MLflow as well as how to get started with the project.

Machine Learning Development is Complex

Let's dive right into the challenges associated with machine learning application developments. As many of you are probably painfully aware, developing machine learning applications is complex. To expand on that a bit, let's take a look at the typical machine learning life cycle. It's a four-step process that all begins with the collection of raw data. Once data has been collected, it is cleaned and processed and ultimately this process data is used to fit or train a model. Finally, this model is deployed to a production environment to satisfy either a business or production research use case. Oftentimes these production applications receive new data that was not part of the original training dataset kickstarting the next iteration of this life cycle.

At face value, this may sound somewhat straightforward, we have a four-step process, but there are several layers of difficulty and complexity and implementation that actually make this a very daunting task for many organizations. Let's take a look at the first one. If I were to sit down and try to implement any particular phase of this lifecycle, the first thing I notice is that there are a large number of tools available for any one stage, but there is no single tool that implements all four, which means that practically as a developer, I'm stitching together code for data ingest and preparation like Kafka with model training frameworks like TensorFlow with deployment environments like Kubernetes and then repeating the process each time I add a new tool, for example, running the same funnel through Scikit-learn and deploying to Amazon SageMaker. Oftentimes, this problem lends itself to the development of brittle pipeline code that is prone to failure as APIs evolve and as organizations scale.

Provided your organization is able to successfully craft pipelines to stitch together various tools, the work is still not complete. We also observe that hyper parameter tuning is a major elements of the machine learning life cycle and must be supported. Models are highly sensitive to configuration parameters referred to as hyper parameters that dramatically impact their performance. Selecting the appropriate set of parameters can produce a model that will revolutionize a production use case, but failure to select reasonable parameters may produce a result that's no better or even worse than guesswork. It's paramounts that when you're developing a platform that implements this life cycle, that you allow data scientists to adequately explore this parameter space.

Additionally, we note that scale becomes a problem when implementing a solution to this life cycle. While pipeline code may work for a small handful of developers, we often see that this pipeline code fails to scale properly to large organizations, and this is becoming increasingly problematic as the number of machine learning practitioners increases.

Finally, and very importantly, we note that model exchange and governance is a major problem implicated by the machine learning life cycle. You see, it's not sufficient to train a model once in a black box, deploy it and forget where it came from. For every model within your organization, there needs to be a complete lineage or track record of the hyper parameters that were used to train that model as well as source code, performance metrics and information about who trained the model and when that model was trained. This is particularly important for organizations that are beginning to leverage machine learning in high scrutiny environments such as the financial sector.

Now that we've expanded on these set of difficulties associated with this life cycle, it's clear that we don't have a simple four-step process, but rather a daunting platform development challenge that is challenging many organizations that are attempting to leverage machine learning today.

You may be asking, are there any platform solutions right now that standardizes life cycle? The answer is, sort of. We've identified that several platforms from large established organizations such as Facebook's FBLearner, Uber's Michelangelo, and Google's TFX attempt to standardize the machine learning life cycle. However, there are several drawbacks associated with these platforms. The biggest one that I'll mention first is that they're not entirely open source and are often tied to a particular company's infrastructure. What that means is unless you're a data scientist or machine learning developer who is lucky enough to work at one of these companies, you often can't leverage the full benefits of their standardized life cycle.

The second aspect is that even internally, these platforms limit the set of tools, programming languages and algorithms that data scientists can leverage when building their models. For example, while TFX may be an excellent platform for writing TensorFlow models, for example, it is less prone or less apt in the face of our developers or users who want to write classical models and frameworks like Scikit-learn.

This leads us to a motivating question, can we provide the benefits of a standardized life cycle in a similar vein to these platforms, but do so in an open manner allowing data scientists to bring the set of tools and languages and algorithms that they require on a daily basis?

Introducing MLflow

Introducing MLflow - an open source platform for the machine learning life cycle. MLflow is built on an open interface philosophy, defining several key abstractions that allow existing infrastructure and machine learning algorithms to be integrated with the system easily. This means that if you're a developer who wants to leverage MLflow and you're using a particular framework that's currently unsupported, the open interface design makes it extremely easy to integrate that framework and start working with the platform. Effectively, this means that MLflow is designed in principle to work with any machine learning library or any language.

Further, MLflow facilitates reproducibility, meaning that the same training or production machine learning code is designed to execute with the same results regardless of environments, whether in the cloud, on a local machine, or in a notebook. Finally, MLflow is designed with scalability in mind, meaning it is just as useful for a small team of data scientists as it is for a large organization consisting of potentially thousands of machine learning practitioners.

Now that I've provided a high level overview of the motivation for MLflow, I'm going to walk through each of MLflow's three components, tracking, projects, and models. Tracking is a centralized repository for metadata about training sessions within an organization. Projects is a reproducible self-contained packaging format for modeled training code ensuring that that training code runs the same way regardless of the execution environment. Models is a general purpose model format enabling any model produced with MLflow to be deployed to a variety of production environments.

MLflow Tracking

I'm going to begin by diving into MLflow Tracking. There are several key concepts associated with a centralized training metadata tracking repository that MLflow enables collection of. The first concept is that set of important hyper parameters or configuration knobs that impact model performance. These can all be saved using MLflow's APIs and centralized tracking service. Additionally, users can log performance metrics that provide insights into the effectiveness of their machine learning models. Additionally, for reproducibility, MLflow enables users to log the particular source code that was used to produce a model as well as its version by integrating tightly with Git to map every model to a particular commit hash.

Further and perhaps most importantly, for reproducibility, MLflow can also be used to log artifacts, which are any arbitrary files including training, test data and models themselves, which means that if I'm a developer who just trained a model, I can persist it to the centralized tracking service and one of my colleagues can load that model sometime later and either continue to train and experiments or productionize that model to satisfy a particular need. Finally, for any concept that's not supported as a first-class entity in the tracking ecosystem, MLflow provides supports for flexible tags and notes associated with the training session. For example, notes might be a good place to drop some information about the business use case for which a model is being developed.

Now that we're familiar with the set of concepts that MLflow tracking is designed to manage, let's take a look at how that MLflow tracking service integrates with the existing training ecosystem. We observed that developers and machine learning practitioners train their models in a variety of locations. Some prefer to use cloud hosted notebook services like JupyterLab or Databricks. Others run model training jobs on-prem or on their local machines. A third group uses remote task execution services in the cloud and there are many others. Regardless of environment, the goal for MLflow tracking is to allow users to capture all of this important metadata and the way that users are able to do so is by MLflow tracking APIs written in Python, Java, R and also available RESTfully that allow users to easily instrument that code and retrieve all of that content regardless of where the code is being executed.

Once this information has been collected, it is aggregated by a centralized tracking server that is pluggable and designed to run on a variety of popular infrastructure that you may be using today. Further, this tracking server exposes a view of all of this training information both through a high performance illuminating user interface as well as a set of programmatic APIs so that developers can interact directly with the data and perform analytics about the models that are being trained within their organization.

Let's take a look at an example of the MLflow tracking API in python. In this case, we're going to walk through a very simple example whereby a user hypothetically trains a machine learning model and logs the associated metadata that I just talked about on the previous slide. It all starts with the creation of a new Run using the mlflow.start_run directive. A Run is a term for a training session. Once a training session has been initialized, MLflow's log_param directive can be used to log hyper parameter information. For example, the number of layers in a neural network or the alpha parameter associated with a logistic regression algorithm. After this point, users can simply inject any training code that they may already be executing. MLflow is not at all prescriptive or restrictive about the type of code that can run within a model training session.

After the model has been fit, performance metrics are obtained and these are logged using the mlflow.log_metric directive. We oftentimes observe that research groups and businesses produced visualizations of model performance using sophisticated plotting tools and these visualizations can be directly logged using the mlflow.log_artifact directive for any arbitrary file. It's also worth noting that the MLflow UI provides its own visualizations as well.

Finally, the most important part for reproducibility comes from an MLflow model-specific integration for persisting the model artifact itself. For example, you see here the mlflow.tensorflow.log_model directive, which will take your TensorFlow graph persist it in an MLflow model format that you'll learn more about later and upload it to the centralized repository so that colleagues can download and begin using that model later on.

Demo: Instrument Keras Training Code with MLflow Tracking APIs

Now that we're familiar with the overview of the tracking component, we're going to walk through the first part of our demo and I realized this is a training data set that you're likely all familiar with. We're going to be talking about digit classification with MNIST and the reason I'm doing so is because MNIST provides a very self-contained training data set and problem that emphasizes how easy it is to integrate MLflow with your existing training code. The first thing that we're going to do is take a Keras feed forward neural network training scripts that I've written and we're going to instrument it using those python tracking APIs and see just how easy it is to collect all of this metadata.

The first thing I'll do is hop over to my terminal. I'm going to take a look at this Keras train Python script and walk through the overall structure. Initially, we see that this training script defines a set of arguments, a set of hyper parameters that are going to impact model performance such as the number of epochs, the learning rates, and the batch size. After that, the script loads the MNIST digit classification dataset using a Keras API and then it defines a feed for old neural network using the Keras sequential API. After this, it simply fits the model on the training dataset that we loaded previously and after the model has been trained, it ultimately evaluated on test data and produces test metrics.

Now what we're going to do is instrument this code with MLflow. The first thing that I'll do is import the MLflow library as well as the Keras-specific MLflow module that's going to be used to persist that model to the centralized tracking repository. After we've identified the training parameters and hyper parameters for our model, we're going to log those as well. In this case, we leverage the log_param directive that we saw on the previous slide to log the batch size, the epochs, and the learning rates.

The next thing that we'll do is define a Keras callback that is going to log some metrics associated with the model after each training epoch. We're going to go ahead and pass this call back into the models fit function so that our metric logging executes after each training epoch. This is going to instrument the process of training. The next thing that we're going to do is obtain those evaluation metrics and we're going to log the test loss and test accuracy at the end using the same log_metric directive.

Finally, the most important part for reproducibility, we're going to go ahead and use the mlflow.keras module to log the model itself to the centralized tracking repository. I'll go ahead and save this file and we'll note that we didn't add any more than 20 lines of code here to fully instrument our existing training scripts. MLflow's python APIs are super lightweight and powerful. The next thing that I'll note is I've defined this MLflow tracking URI environment variable in my terminal and this environment variable refers to a particular instance of the MLflow tracking server running on AC2.

When I run this training script, MLflow is going to pick up this environment variable and forward all of the data to this centralized tracking repository. I'm going to go ahead and run this, and I'll specify that we're going to use two training epochs, which hopefully doesn't take too long. While this runs, I'll hop over to the MLflow UI and provide an overview of how training sessions are visualized. We see that we have a list of training sessions under a particular experiments. An experiments is simply just a collection of training sessions that are aggregated around a particular use case. We see that important metadata such as the user who produced the model is recorded here as well as source code that was used to train that model, the set of hyper parameters, those batch sizes, dropouts learning rate for example, and then metrics.

If I go ahead and refresh, we should see a new model training session corresponding to the Run that we just created. We'll see that we have a new session from about a minute ago, wee that I trained it and we get the name of the exact source script that We see that we train with two epochs. If we go ahead and click, we can hone in on some more tabular information about the model training session and visualize the set of produced artifacts. We'll come back to this specific view of this training session later on in the demo.

MLflow Backend Stores

Now I'm going to hop back into the presentation and talk a little bit about how you can get started with MLflow tracking within your organization. The MLflow tracking service backends is divided into two components. The first is an entity or metadata store that's designed to collect and aggregate all of the lightweight metadata associated with a training session. This is for your metrics, your parameters and source and version for example. The metadata store is designed to work with any UNIX or windows file system. It's also compatible with a wide variety of SQL databases via SQLAlchemy. These include Postgres, MySQL, as well as SQLite.

Finally, for organizations that wish to bring their own infrastructure, the metadata store provides a RESTful abstraction, allowing your organization to plug in any particular infrastructure RESTfully and implement the metadata store abstraction. The second aspect of the MLflow backend is for heavier weight artifacts such as those training data files and models. The artifacts store is also designed to run on top of a variety of existing production infrastructure such as Amazon S3, Azure Blob storage, HDFS, Google cloud storage, the Databricks' file system, FTP and SFTP. There are a large number of options that likely fit your organization's use case for getting started with MLflow tracking.

MLflow Projects

Now that we've seen an overview of the tracking components, I'm going to talk about MLflow projects, a reproducible packaging format for model training sessions regardless of execution context. To start with motivation, we observed that businesses are leveraging a diverse set of machine learning training tools, but they're also running these training tools and a diverse set of environments as we saw in one of the previous slides. For example, they may be running their training code in the cloud, they may be running on a local machine, they may be running in a notebook.

This leads to this challenge, which is that machine learning results are difficult to reproduce. Oftentimes, the same exact training code doesn't run the same way or produce the same results in two different places. MLflow's solution to this is a self-contained training code project specification that bundles all of the machine learning training code along with its version library dependencies, its configuration and its training and test data. By fully specifying the complete set of dependencies for a machine learning training task, MLflow enforces reproducibility across execution environments. It does this by installing all those libraries and achieving the exact same system state wherever the code is running.

What does an MLflow project look like? At its core, an MLflow project is simply a directory. It's a directory with this optional configuration file and it contains the training code, the library dependency specification, and other data required by the training session. These library dependencies are specified in multiple ways. For example, users can include a YAML-formatted anaconda environment specification to enumerate their training codes library dependencies. They can also include a Docker container and MLflow will execute that training code within the specified container.

Finally, MLflow provides a CLI for executing these projects as well as APIs in python are in Java. These projects can be executed both on the user's local machine as well as several remote environments including the Databricks job scheduler as well as Kubernetes.

What does an example MLflow project actually look like? On the left we see that we have a simple directory structure as alluded to before. At the top level, there's this ML project configuration file, which we'll come back to in a minute. We also have a conda.yaml file specifying the collection of library dependencies. Finally, we have some training scripts, and and this would also be the place to bundle in training and test data or hooks to that information. Honing in on the ML project configuration file, we see that the configuration references that library dependencies conda file. If I were using a Docker container, I would define a Docker end variable instead with a fully qualified URI to my specific container.

Additionally and more interestingly, the configuration file specifies a set of entry points which consists of a command and then the user configurable parameters to pass to that command at runtime. MLflow does not restrict the types of commands that can exist within a configuration file. As long as it'll run in a bash shell or a window shell, it's a valid MLflow project command and in this case, we'll see that we have a couple of parameters. The first is training data and this is a special type called path. Your training data can actually reference external artifacts that exist on a variety of storage solutions, the local file system S3 and other repositories. We also have a simple float parameter, a lambda with a default value of 0.1.

Honing down on the bottom right hand of the screen, we see an example of how the MLflow runs CLI can be used to execute a project. What I'd like to call out here is that MLflow run is directly compatible with projects that exist on GitHub. MLflow will automatically clone that repository, check out the required Git commits and then begin executing the code.

Demo: Run Training Code as an MLflow Project

Let's continue our demo and see how ML projects make it easy to package reproducible machine learning code. I'm going to hop back over to Google chrome and we're going to take a look at an example GitHub-hosted MLflow project. You'll see that in this directory structure we have the ML project configuration file. We have a conda.yaml file specifying a set of dependencies and these include Keras, a GPU installation of TensorFlow as well as a specific version of the MLflow library. Going back, we also see that we have our training script and if we hone in on the ML project file, we'll see that we referenced our conda environment and we define an entry points with a set of parameters that are familiar from the training script in the tracking section of the demo. We have the batch size, we have the epochs, and we have the learning rates.

It's worth calling out that when you run an MLflow project with parameters, all of these parameters are automatically logged to the tracking service, which means that you don't have to instrument project code with that log_param directive. All of this is automatically recorded. What I'm going to do here is copy the URL of that Git repository and hop back over to our terminal. What I'll do now is SSH into a remote GPU-based instance, and I'm going to go ahead and set that same MLflow tracking URI environment variable to communicate with that same centralized tracking repository. Now, I'll go ahead and leverage the MLflow Run CLI to execute that GitHub-hosted MLflow project. I'll pass in a larger number of epochs and I'll go ahead and hit enter. MLflow is going to clone that Git repository, activate its associated conda environments and begin training. We'll see that because we're running on a GPU, the training happens a lot more quickly and we ultimately produce a model with better performance.

If we hop back over to our tracking UI and go ahead and refresh, we'll see that we have a new training session corresponding to that remote MLflow project run. We can go ahead and click on the source file and it takes us right back to that same GitHub repository providing a complete lineage of exactly how this model was produced. Additionally, we can go ahead and compare the performance of this model to the performance of the locally-trained model. We'll go ahead and select these two tick boxes and hit compare. In this case, we have a tabular view that demonstrates the set of parameters associated with each training session as well as its metrics. If you want to get a more visual take on the performance tradeoffs, we can go ahead and hone in on a metric like training accuracy and we'll see that it's plotted both relative to wall clock time as well as to the number of training epochs. In this case, we can see that after a training two epochs, for example, that locally-trained model did pretty well, but by training for additional epochs, we ended up getting better accuracy.

That concludes the MLflow projects components of the demo, and we'll return to that same project Run when attempting to deploy the model for serving later on in the demo.

MLflow Models

First, I'd like to talk about MLflow models, a general purpose model format supporting a diverse variety of productionization environments. Now, the motivation for MLflow models is very similar to the motivation for projects. We again observe that models can be written using a wide variety of tools, but they can also be productionized or deployed in a wide variety of environments as is distinct from training environments. These environments include real-time serving tools such as Kubernetes or Amazon SageMaker, as well as tools for streaming and batch scoring like Spark. Additionally, some organizations may wish to stand up models as a RESTful web service running on a preconfigured cloud instance.

It's tempting as an organization that wants to deploy to real time and to batch and is using several machine learning tools to write these deployment pipelines from a particular tool to a particular environments. For example, a business might stitch together TensorFlow with Kubernetes, a research organization may stitch together Scikit-learn models with the batch scoring feature in Spark, for example. What we find is that as the number of tools that organizations are using scales and as they begin to productionize in new ways, we end up with this kind of one-to-one rat's nest mapping that becomes hard to maintain over time. The solution to this problem of mapping M frameworks to N different deployments environments is a unified model abstraction called an MLflow model that can be produced using a variety of common ML tools and then deployed to a variety of machine learning environments, providing this intermediate layer and avoiding the one-to-one mapping problem.

What does an MLflow model look like? Similarly to a project, an MLflow model is also a directory structure. It contains a configuration file and instead of containing training code this time, it contains a serialized model artifact. It also contains, as a project, this set of dependencies for reproducibility. This time we're talking about evaluation dependencies in the form of a conda environment. Additionally, MLflow provides model creation utilities for serializing models from a variety of popular frameworks in MLflow format. Finally, MLflow introduces deployments, APIs for productionizing and any MLflow model to a variety of services, and these APIs are available in Python, Java, R, and by a CLI format.

Let's take a look at an example MLflow model that could be produced using the convenience utility mlflow.tensorflow.log model. By calling this function, we obtain a directory structure similar in nature to a project. At the top layer, we have the ML model configuration file. We also have, in this case, a serialized TensorFlow estimator containing a graph and a collection of variables. Focusing in on that configuration file, we'll see that this time it contains some important metadata about the specific model. In this case, the run ID, which is a unique identifier for the training session that produced it as well as the time that it was created that. Additionally, this configuration file contains an important field called flavors. A flavor is a language and tools specific representation of an MLflow model.

In this example, we have two flavors that have been bundled with the model. We have the TensorFlow flavor and we have the python function flavor. With the TensorFlow flavor, the MLflow model can be loaded as a native TensorFlow object, for example, at tf.estimator instance or a TF graph, and this makes it usable with any TensorFlow API for evaluation or continued training. With the Python function flavor, MLflow introduces an additional layer of abstraction for loading and evaluating this model. Via Python function, an MLflow model can be represented as a vanilla Python function accepting a Pandas data frame, meaning that in order to load an evaluate this model, I no longer have to reason about the internals of the TensorFlow library.

Model Flavors Example

To expand on model flavors a bit, let's walk through a hypothetical example where the user trains the model, logs it to the tracking service and then sometime later loads it and evaluates it. The first step would be to train a model doing so using a framework like Keras, which we've been doing in the demos. The next step is to persist it using the mlflow.keras.logmodel directive. This produces an MLflow model format with two flavors. The first is that Python function Flavor, abbreviated Pyfunc, that you saw previously, and the second is a Keras-specific flavor.

If I were to try to load and evaluates the Pyfunc representation of an MLflow model, we see that the evaluation code is very simple. By invoking mlflow.pyfunc.load pyfunc, I represent this model as a vanilla python function. Then to evaluate it, I simply pass in a format of Pandas data frame and return a Pandas data frame output. It's two lines of very simple code that completely abstracts away the details and guts of Keras.

Optionally, users can also load the Keras-specific flavor to obtain a native Keras object. In this case, mlflow.keras.loadmodel yields a Keras model object which can be evaluated using the Keras specific model that predict API passing in yet another Keras-specific option. This highlights how MLflow flavors allow users to interact with their models at differing levels of abstraction to meet their particular use case. I would like to highlight that the Pyfunc abstraction is extremely useful when we consider the number of model creation utilities that MLflow supports. For example, models trained in Keras, TensorFlow, Spark, Scikit-learn, PyTorch and several other frameworks are all automatically serialized with this python function representation meaning that those same two lines of code that I use to load and evaluate that Keras model are compatible with any model in the MLflow ecosystem that contains this python function format, which means that for deployment engineers, it's super easy to write thin evaluation layers that are compatible with the wide set of models that are being developed within my organization.

Demo: Deploy an MLflow Model for Real-Time Serving

Now, I'd like to return to our demo and see how we can deploy an MLflow model locally for realtime serving. The first thing that I'm going to do is hop back into Google Chrome and pan over to this simple web app for a hand drawn digit classification. The goal here is to be able to draw a digit such as five and forward predictions to a hosted model. Now you notice this doesn't quite work yet because we haven't actually hooked up a model to that web app. The first thing that we're going to do is go back into MLflow and open up the training session corresponding to our previous remote project run. This training session has a unique identifier or run ID that will copy for later reference. If we scroll down, we also see that there are some artifacts associated with our training session.

We particularly call out this Keras Pyfunc representation of our model, which we're going to use to load and serve in real time. This python function representation specifies the conda environment that will be activated when running that model as well as the model configuration file with those important attributes such as the set of flavors, the run ID, and the time created. If I head back to my terminal, I can go ahead and exit my GPU instance, go ahead and make sure that my MLflow tracking URI is still set, and now I'll leverage the MLflow models CLI to serve the model as a local RESTful web service. I can type MLflow models serve pass in a unique reference to that model artifact. In this case, I will paste in the run ID, I will also type in the name of that Python function representation and I'll hit enter.

The download may take a moment, so let me explain what MLflow was doing behind the scenes. MLflow will fetch this serialized Keras model, that MLflow formatted model from the tracking server. It will then activate its Conda environments. Once the environment has been activated, it will start a flask server running in python and this flask server will then load the generic python function representation of the model such that when a request is received, it's transformed to a Pandas dataframe, pass to the Python function, and then the output data frame is returned as Jason.

It looks like this loaded successfully, I'll go ahead and return to our web app and see if we can drive digit successfully this time. It looks like if we draw a five, we get a five we can try a different digit. It's a poorly drawn too, but it seems to pick that one up. I know this model has some problem with sevens, so this facilitates the idea that MLflow is an ongoing experimentation tool by which users can continue to iterate and develop on their models without changing particular evaluation code. That concludes the overview of the collection of components associated with MLflow.

1.0 Release

Now I'd like to talk a little bit about the current state of the project as well as the ongoing development roadmap and tell you guys had to get started. MLflow recently released stable version 1.0 and there are several important features to call out. The first is a new metrics UI which you saw briefly during the projects component where runs can be visually compared against one another on multiple axes. Additionally, to call it out specifically, you can now associate metrics with particular training epochs or iterations via a generic step parameter to that log metric call.

In MLflow 1.0, search has been greatly improved, you can search model training sessions based on attributes such as parameter values, metric values, a run start time or user, for example. Additionally, 1.0 supports packaging MLflow models as Docker containers for deployment to platforms like Kubernetes. Finally, the last feature I'd like to call out, which is actually a community contribution, is support for the ONNX framework in MLflow via an MLflow flavor.

Ongoing MLflow Roadmap

I'll provide a high level overview of MLflow's ongoing development roadmap. We're actively working on these new components, which is a model registry for model management. This will allow users to version models as well as keep track of where they're deployed and eventually this model registry integrate with several monitoring and telemetry tools to provide insight into production model performance.

We're working on support for multistep project workflows which allow one MLflow project to call another creating a training execution pipeline. We're also working on introducing improved tracking APIs for Scala and Java. These APIs currently exist, we're just making them easier to use. We'd also like to package projects with belled steps, so this is in the works as well as provide an improvement on the input and output Schema associated with that python function representation extending it to tools like NumPy, for example, to provide enhanced flexibility.

Get Started with MLflow

Getting started with MLflow is really easy. It's available on pip via pip install mlflow. You can find docs and examples at and if you'd like to make a contribution to the project, you can get us on GitHub at or Slack us on Thank you so much for listening and I hope you took some insights out of the open structure of the MLflow platform and learned a lot about its components.

Questions and Answers

Moderator: Thanks Corey [Zumart] for the great details on MLflow. I'm going to have a first question for you. That demo you had at the end was pretty cool. Is that something I can download? Is that on your laptop only what you draw and then it...?

Zumar: I go ahead and send you the source for it to different open source GitHub project and a modified it to communicate with that model.

Participant 1: I have a question about where to get started. I know there's a lot of tools. I'm really new to machine learning as a software developer. Maybe the first question is who are the competitors of MLflow because I know that for every tool there's going to be competitors. Also, with the active development with all these libraries, which are the libraries that I constantly have to keep on track of? I know that there are wonderful features coming out that blows my mind. My first question is who are the competitors and second, what are the libraries that I have to constantly keep on top of the ecosystem?

Zumar: To start off by addressing sort of what the competitive situation looks like, previously on the talk, I highlighted some major machine learning platforms that standardize this machine learning life cycle. Tools like Google's TFX and TensorFlow do provide competing implementations for model metadata tracking as well as deployments using tools like TensorFlow serving. However, as I previously stated, these are often constrained to particular model types. There are a number of other paid services, for example, such as weights and biases that we'll try to sell this functionality as well and provide metric logging for example. We often find that they aren't quite as feature complete and they aren't as extensible by virtue of being paid services.

In terms of the complete management of model training deployments and introducing these remote execution abstractions, MLflow is pretty unique in its open source structure. In that sense, there is another platform I think DVC for handling model training, data management and MLflow was actively working to move more into training data collection and ingest and handle those portions of the life cycle as well. You can expect to see that on the ongoing roadmap. I hope that provided a pretty good overview of competing products. The big takeaway here is that MLflow is really the big open source and extensible one.

Additionally, to answer your other question about which libraries need to be on top of, MLflow provides these conda environments associated with the projects and the models components. The whole goal for providing these conda environments in Docker containers is to simplify library managements. When you train a model with MLflow, oftentimes this set of library dependencies is automatically produced, which means that as soon as you try to go ahead and serve and continue to load and train that model, all of the library management is baked into the MLflow platform. This means that the easiest thing you can do is just pip install MLflow and get started. As soon as you save your model, all of those required libraries are going to be available.

Moderator: Is MLflow written in Python or Java or what language thing is being used?

Zumar: The bulk of MLflow is written in Python. We provide tracking API implementations as well as model API implementations in Java and R and you can interact with various components such as deployment pieces, the remote project execution for example, via a command line interface. We're actively working to enhance the support for languages like R, Java and Scala. If there's a particular language you'd like us to support, please reach out on the Slack or file a GitHub issue, and we'd love to talk about collaborating on introducing support for additional languages.

Participant 2: I saw when you were doing the training data, how to connect that remote server. Is there an offline mode if you didn't have a network connection?

Zumar: When I was talking about the Metadata or the entity store, that's the MLflow backend is implemented on one of those options was a file system. It was a very minor detail, but the point is that MLflow's tracking components can run on top of any UNNX or windows file system. This also extends to artifacts, meaning that the easiest way to get started is by running and testing locally on your machine, and you can do that without calling out any particular tracking URI. It'll just start using the local directory associated with your training script unless otherwise specified.


See more presentations with transcripts

Recorded at:

Aug 20, 2019