Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations What You Should Know before Deploying ML in Production

What You Should Know before Deploying ML in Production



Francesca Lazzeri shares an overview of the most popular MLOps tools and best practices, and presents a set of tips and tricks useful before deploying a solution in production.


Francesca Lazzeri, PhD, is an experienced scientist and machine learning practitioner, author of the book “Machine Learning for Time Series Forecasting with Python” (Wiley). Lazzeri is Adjunct Professor of AI and machine learning at Columbia University and Principal Data Scientist Manager at Microsoft. Before joining Microsoft, she was a research fellow at Harvard University.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.


Lazzeri: I'm Francesca Lazzeri. I'm a Principal Data Scientist Manager at Microsoft. I'm also Professor of Machine Learning and AI at Columbia University. In this session, we're going to learn together what you should know before deploying machine learning in production. There are many different limitations and opportunities during the machine learning lifecycle. MLOps, that stands for machine learning operations can help data scientists and engineers overcome these limitations and actually see them as opportunities. The importance of MLOps is due to the following reasons. First of all, machine learning models rely on a huge amount of data, and so it is very difficult for data scientists but also engineers to keep track of all of them. It is also challenging to keep track of the different parameters that we tweak in machine learning models. As you know, small changes can lead to very big differences in the results that you get from your machine learning models. We also have to keep track of the features that the model works with. Feature engineering is another very important part of the machine learning lifecycle and can really impact your model accuracy. Then, monitoring a machine learning model is not really like monitoring, deploying the software or a web app. Debugging a machine learning model is actually very complicated, very complex type of work, because models rely on real world data for predicting. As real world data changes, it is important to track your model and also update your model. This means that we have to keep track of new data changes and make sure that the model learns from them.

What You Should Know Before Deploying ML in Production

What are the four different aspects that you should know before deploying machine learning in production? We're going to look at four key aspects. These are different MLOps capabilities, open source integration, machine learning pipelines, and MLflow.

MLOps Capabilities

Let's start with MLOps capabilities. There are many different MLOps capabilities. The most important MLOps capabilities are first of all the capability of creating reproducible machine learning pipelines. Machine learning pipelines allows you to define repeatable and reusable steps for your data preparation, training, and scoring processes. It's also important to create reusable software environments for training and deploying models. Then, register, package, and deploy models from anywhere is a very important MLOps capability. You need also to track the associated metadata that are required to use the model. Then capture the governance data for the end-to-end machine learning lifecycle is another important aspect. Here, in this case, the longer lineage information can include, for example, who is publishing the model. Why changes were made at some point. When different models were deployed or used in production. It is also important to notify and alert on events in the machine learning lifecycle. For example, experiment completion, model registration, model deployment, and data drift detection, these are all important notifications that you should have. Then, monitor machine learning applications for operational and ML related issues. Here, it is important for data scientists to compare model inputs between training and inference, for example. Also explore model specific metrics, and provide monitoring and alerts on your machine learning infrastructure. Finally, the last MLOps capability that I think is extremely important is the option of automating the end-to-end ML lifecycle with different machine learning pipelines. Using pipelines allows you to frequently update models, also test new models, and also roll out new models alongside with your other AI applications and services.

Open Source Integration

Then, there is the second aspect that you should know before deploying machine learning in production. This is about open source integration. Here, there are three different steps that I think are extremely important when you think about open source integration. These are the option of training open source machine learning models, which is great for accelerating your machine learning solutions. Frameworks for interpretable and fair models. These are open source frameworks. Finally, there are different open source tools for model deployment. Let's start with train open source machine learning models. There are many different open source frameworks. Here, I listed only three of them that are PyTorch, TensorFlow, and RAY. These are the three open source frameworks that I use the most.

PyTorch is an end-to-end machine learning framework, and it includes what we call TorchServe, which is an easy to use tool for deploying PyTorch models at scale. What is nice of PyTorch is that there is mobile deployment support and also cloud platform support. It's very nice and useful to use. Finally, the last thing that I want to mention about PyTorch is also this C++ frontend support. This frontend is a pure C++ interface to PyTorch that follow the design and the architecture of Python frontend. The other framework is TensorFlow.

TensorFlow is another end-to-end machine learning framework that is very popular in the industry. What I really like of TensorFlow is the option of using TensorFlow Extended that is an end-to-end platform for preparing data, training, validating, and also deploying machine learning models in large production environments. TensorFlow Extended pipeline is a sequence of components that implement a machine learning pipeline, which is specifically designed for scalable and high performance machine learning tasks. This is another great option that you have.

The last option that I want to mention is RAY. RAY is for reinforcement learning type of scenario. These packages will be the following libraries that I listed here. There is Tune, RLlib, and Train, and Dataset. Tune is great for hyperparameter tuning. RLlib is used for reinforcement learning. Train is for distributed deep learning. Then we have Dataset which is for distributed data loading. The other two libraries that I want to mention for RAY are Serve and Workflows. These are libraries that are great at taking your machine learning models and distributed apps to production.

In terms of open source integration, there are other two open source frameworks that you should be aware of. These are frameworks for interpretable and fair models. There is InterpretML and Fairlearn. InterpretML is an open source package that incorporates machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and also explain blackbox systems. Moreover, it helps you understand your model's global behavior, or understand the reason behind individual predictions. Again, it is a great option when you have to build interpretable machine learning models. The other framework is Fairlearn. Fairlearn is a Python package that has two main components that I use most of the time. Those components are metrics for assessing which groups are negatively impacted by a model, and for comparing also multiple models in terms of their use of fairness and accuracy metrics. The other component is algorithms. This is great because you have different algorithms for mitigating unfairness in a variety of AI and machine learning tasks, and also with different fairness definitions.

Model Deployment - ONNX

Finally, the third aspect under the open source integration is about model deployment. When working with different frameworks and tools, it means that you have to deploy models according to the framework's requirement. In order to standardize this process, you can use what we call ONNX format. ONNX stands for Open Neural Network Exchange. ONNX is an open source format for artificial intelligence models, or for machine learning models. ONNX supports the interoperability between frameworks. This means that you can train a model in one of the many popular machine learning frameworks, for example, PyTorch, TensorFlow, and RAY. You can convert it into ONNX format and you can consume the ONNX model in different frameworks, for example, in

Specifically, there is ONNX Runtime. What is ONNX Runtime? ONNX is an open source format that is built to represent machine learning models. What is nice of ONNX is that it defines a common set of operators, the building blocks of machine learning and deep learning models, and then a common file format to enable data scientists and AI developers to use models with a variety of different frameworks, tools, runtimes, and compilers. ONNX Runtime, that is ORT, is great at optimizing and accelerating machine learning inferencing and training. You can, for example, train in Python, deploying with C#, Java, JavaScript, and many more. If you have specific questions about how to use ONNX and ONNX Runtime on Azure, feel free to contact Cassie Breviu. She is a fantastic product manager at Microsoft. She's always looking for scenarios on how data scientists and machine learning engineers are using ONNX and ONNX Runtime.

The other nice aspect of leveraging ONNX Runtime is the inference option. Of course, ONNX Runtime inference can enable faster customer experiences and also lower your cost, which is great. It supports models from deep learning frameworks such as PyTorch, and TensorFlow, but also classical machine learning libraries, such as Scikit-learn. There are many different examples of use cases for ONNX Runtime inferencing. Some of them, for example, is the fact that it improves the inference performance for a wide variety of machine learning models. It runs on different hardware and operating systems. You can train in Python. For example, you can deploy into C#, C++, Java app. Finally, you can train and perform inference with models created in different frameworks. All of these represent excellent use cases and reasons of why you should use and explore ONNX and ONNX Runtime.

There are many different popular frameworks that support conversion to ONNX. For some of these, for example PyTorch, ONNX format export is built in. For others like TensorFlow or Keras, there are separate installable packages that you can handle in order to process this conversion. Here, there are some examples of model conversion. The process is very straightforward. First of all, you need to get a model. This model can be trained from any framework that support export and conversion to ONNX format. Then you need to load and run the model with ONNX Runtime. Then, the third step is about tuning performance using various runtime configurations or hardware accelerators. This is in order to optimize your model and to tune performance.

Machine Learning Pipelines

The third aspect that you should know before deploying machine learning in production is about machine learning pipelines and how you can build these pipelines for your machine learning solution. Machine learning pipelines should focus on machine learning tasks such as data preparation, including importing, validating, and cleaning, transformation, normalization, and staging of your data. Then, there is training configuration including parameterizing arguments, file paths, and logging, reporting configuration. Then there is the training and validating in a way that is efficient and also repeatable. Efficiency might come from specific data subsets, different hardware, compute resources, distributed processing, and also progress monitoring. Finally, there is the deployment step that is about including versioning, scaling, provisioning, and access control. One of the questions that I get most of the time is, which pipeline technology should I use? Here, I list the three different scenarios. There is the model orchestration that is about machine learning model. Then we have data orchestration that is about data preparation. Then you have code and application orchestration.

Let's start from the first one. Here we have model orchestration. The primary persona is a data scientist. In terms of open source options, we have Kubeflow pipelines that you can leverage. The canonical pipe is from data to model. Then we have data orchestration that is about data preparation. The primary persona is a data engineer. In terms of open source offers, we have Apache Airflow. The canonical pipeline here is data to data. Finally, the third scenario that I found very popular is code and application orchestration. Here, the primary persona is an app developer. The canonical pipeline here is code plus model, to an app and a service.

When you create and run a pipeline object, the following high level steps occur. This is an example of a pipeline object that is created on Azure Machine Learning. For each step, the service calculates requirements for the hardware, compute resources, OS resources, for example, Docker Images, software resources, for example, Conda, and data input. Then the service determines the dependencies between steps, resulting in a very dynamic execution graph. When each node in the execution graph runs, the service configures the necessary hardware and software environment. Then the step runs providing logging and monitoring information to its containing experiment object. When the step completes, its outputs are prepared as inputs to the next step. Finally, the resources that are no longer needed are finalized and also detached.


The final tool that you should consider before deploying machine learning in production is MLflow. Let's learn together, what is MLflow. MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles four primary functions that are extremely important in the machine learning lifecycle. These are tracking experiments to record and compare parameters and results. Then packing ML code in a reusable and reproducible form in order to share with other data scientists or transfer to production environment. Then the other aspect is managing and deploying models from a variety of machine learning libraries to a variety of model serving and inference platforms. Finally, there is providing a central model store to collaborate and manage the full lifecycle of a machine learning model, including model versioning, stage transitions, and annotations.

Let's start with the first one that is MLflow Tracking. MLflow runs can be recorded to a local file to a SQLAlchemy compatible database or remotely to a tracking server. You can log data to run using MLflow Python, but also R, or Java, or a REST API. MLflow allows you to group runs under experiments, which can be useful for comparing runs and also to compare runs that are intended to tackle a particular task, for example. Then there is the MLflow Projects. MLflow Project is a format for packaging data science code in a reusable and reproducible way, based primarily on conventions. In addition, the project's component includes an API and command line tools for running projects, making it possible to chain together project into workflows.

Then there is the MLflow Models. A model is a standard format for packaging machine learning models that can be used in a variety of downstream tools. For example, real time serving through a REST API or batch inference on Apache Spark. Each model is a directory containing arbitrary files, together with an MLmodel file in the root of the directory that can define multiple flavors that the model can be viewed in. Then there is the MLflow Registry. The MLflow model registry is a component that is a centralized model store, set of APIs, and also UI in order to manage in a collaborative way the full lifecycle of an MLflow model. It provides a model lineage, but also model versioning, stage transition, and annotation. The model registry is extremely important if you're looking for a centralized model store and a different set of APIs in order to manage the full lifecycle of your machine learning models.

Summary & References

These four aspects are extremely important. You should know them before deploying your machine learning solutions in production. Those four aspects are different MLOps capabilities, different open source integration and frameworks, different machine learning pipelines, and finally MLflow open source tool that can really help you with deploying machine learning in production. In this slide you can find a list of references to the different GitHub repos and documentation, some of the different open source tools that I have been using for this presentation.

Questions and Answers

Greco: Of all the stuff, certainly those four pillars, you need to have the font of those, like 64 point, because those are the critically important, those four pillars. It's amazing. Of those four, you certainly don't see new customers diving into those four. What are the big mistakes that customers make? They might not know about those four pillars, but what's a very common mistake, because there's certainly a lot of failed ML projects, more than we like, in the ML field. What are some common mistakes?

Lazzeri: One of the biggest mistakes that customers, but in general developers and also companies, do when we get to MLOps is about thinking of MLOps as a product. We have a lot of machine learning frameworks, we have a lot of cloud providers, and we have different AutoML capabilities. People think that MLOps is just another product or another set of tools that they need to add to their end-to-end machine learning solutions. That is not really the case. I think that it's more about culture, MLOps. It's more about thinking on how you can connect different tools in your end-to-end development experience, and how you can make sure that you are aware of these capabilities that I tried to summarize. How you can optimize some of these opportunities that you have. Really, MLOps is more about strategy and about making sure that you are aware of all the tools, and probably all of the tools that I was presenting are open source tools, which is great, because you have the support of the community. You can also contribute to those open source tools.

One of the biggest mistakes that customers and companies do most of the time is to think about MLOps as a static tool that you can just implement as you are implementing a machine learning or a deep learning framework. It's not really like this. It's more about making sure that you are aware of all these opportunities that you have on the table and you are able to connect these different options that you have in a way that is the best way for you and for your application.

Another important mistake that some of the companies I have been working on in the machine learning space are doing is also, they do not share the information between different professionals. Most of the time, we have data scientists that are speaking their own language, they are very familiar with some of the most important open source frameworks for machine learning, deep learning, and reinforcement learning. I mentioned some of them: RAY, TensorFlow, and PyTorch. Then, they're not really aware or familiar with the different open source tools to make sure that the deployment is successful. Then, how we can move them out of the machine learning model to a production environment and make sure that we can build an AI application that other people can consume. That is, in my opinion, another cultural aspect.

I think that it's important to have a technical team. As a manager, or as a developer, you need to know that there are different professionals that are probably working with specific tools, but you have to make sure that they communicate to each other so that they all have a very good understanding, a clear understanding of what the end-to-end solution is going to do. Most importantly, what's the final outcome that you want to solve to support with these end-to-end solutions? It's all about talking with data scientists, developers, data engineers, machine learning engineers. Different companies have different names and titles for these professionals. At the end of the day, those are all people who work in the machine learning industry. Some of them are responsible for the data preparation and the model training, testing, evaluation, and then deployment. Some others are responsible for the data pipelines and the model pipelines, and how they can deploy these models in production in a successful way. That is another mistake that most of the professionals in the industry do when we get to MLOps.

One of the biggest issues that we have at the moment in the industry is that we have so many great capabilities to develop machine learning models in open source frameworks. This is fantastic, because you're not just using inbuilt models from a specific provider, but you're leveraging the knowledge and the support of the open source community. Then, the issue that we have is that 80% of these models, they're never pushed to production, and they're never really used for a specific business case. Making sure that you are aware of these open source and MLOps capabilities, I think, is the key to make sure that you know how to put together all these different pieces, and how you can make sure that your team is talking, and they are all part of the same solution and of the same goal.

Greco: That's no different than traditional application building these days. It's like you better have an idea of what is the end goal, what problem are you solving. It's very important. I know some failures were, 10 data scientists were hired for a project and they failed to put in production and they failed to do data engineering. They were data scientists. You didn't have the rest of the team. It's certainly an issue.

You did talk about putting the models, whether it's through ONNX, or some other standard mechanism, into production. It seems like there is an interesting trend now of using multiple models, using multiple cooperating models, or maybe not even coop, maybe we have adversarial models, we have different models to use. How do you deploy something like that when you have different models? Any tricks or any tips on that?

Lazzeri: The first suggestion that I have for those types of data scientists that are using different typologies of models is to understand why they're using different models. It's more like models that are answering the same question, or some of those models are actually creating data features that then are used to feed other models, so we have a more like process of different models. Because if it is more like the second scenario where we have multiple models that are working together but process in a specific order, because some of those models are generating that type of information data that you need in order to run other models, it's a less complex situation. Because again, at the end of the day, you need to generate specific results that are generated only from the latest model that you have. You can use simple mechanisms, like you can deploy it in Python. You can create your pickle file, where you have just to make sure that you summarize all the important information. Most of the time, you summarize this information for your model with a Python function, we call them init and run functions. These are functions that you can just write in Python to make sure that you define how the data needs to be ingested, and then how the model needs to run. Then you proceed with a normal deployment process that you can do, like in any programming language that you prefer. The goal for that scenario is to generate this pickle file that then is going to be translated into a web application, web service. It is more like an API that other engineers can leverage to run this application. That is the second scenario. I started with the simplest one.

The first scenario that is actually multiple models that are somehow running in parallel also to generate the same type of insight, predictions in order to support the same scenario. In that case, using tools such as ONNX, that is at the end of your machine learning architecture, in order to standardize, normalize all the different languages and all the different frameworks that you're using. I think that is the best scenario that you have. That is my suggestion. Again, my suggestions are all based on my experience as a data scientist manager. Based on what I've been seeing is that most of the time you are running multiple machine learning pipelines at the same time, because, again, you want to scale your solution using a standardizations tool at the end, like ONNX, is the best tool, at least, until now that I have been using.

The other quick suggestion that I have is about automated machine learning. That is another tool that many different providers have been using a lot. Automated machine learning is not just a blackbox tool, you can consume it with the Python SDK. Basically, what it does for you is not just selecting the best model, but it's actually running multiple models in parallel. Then it's doing hyperparameter tuning for you. It's also trying to select the best model based on your scenario and on your input data. That is another way to scale your solution, not again, because you cannot select your own algorithm. At the end of the day, you are going to be the one who is going to select that. It's just an additional tool that can help you to scale your solution and also improve the time to production. It's between ONNX and automated machine learning, those are the two tools and the two suggestions that usually I have for these type of machine learning scenarios.

Greco: We had a question about monitoring tools for models. Monitoring in a sense of prediction, accuracy, or application performance? Any suggestions there?

Lazzeri: Monitoring model in production is something that I have been doing more with machine learning pipelines. One of the tools, the Python packages that I have been using for that is actually collecting all the log information in the machine learning pipeline. This information is not just about metrics, but it's also about performance. It's basically extracting a report for you. This report is telling you how the model is performing both from an accuracy point of view, like if the model refreshing new data is still performing well. In a sense, it's still exporting good, accurate results or not. Then it's also giving you additional information on the actual performance of the model. Like, is the model really healthy? Is it still performing in an accelerating way or no? That is something that I have been using.

The monitoring of the model is still a more manual type of work that I have been using. It's true that there is this package that is producing those reports for me, which is great. Based on my experience, we haven't built an anomaly detection model that is telling me, the model decreases performance, or the new data that the model is ingesting are not as good as the ones that we use in order to build the model. There are many different messages that this additional algorithm or solution can provide to me, and we haven't really done that as yet. For me, it's more like a manual check-in. However, I use this additional package that is providing me this very accurate report that is still very easy to digest, to look at. There is still some support. This is what I have done so far.


See more presentations with transcripts


Recorded at:

Oct 06, 2022