Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations CI/CD for Machine Learning

CI/CD for Machine Learning



Sasha Rosenbaum shows how a CI/CD pipeline for Machine Learning can greatly improve both productivity and reliability.


Sasha Rosenbaum is a Program Manager on the Azure DevOps engineering team, focused on improving the alignment of the product with open source software. She is a co-organizer of the DevOps Days Chicago and the DeliveryConf conferences, and recently published a book on Serverless computing in Azure with .NET.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Rosenbaum: This is the video of a machine-learning simulation learning to walk and facing obstacles, and it's there only because I like it. Also, it's a kind of metaphor for me trying to build the CI/CD pipeline. I'm going to be talking about CI/CD for machine learning, which is also being called MLOps. The words are hard, we don't have to really define these things, but we do have to define some other things and we're going to talk about definitions a lot actually.

I'm going to start by introducing myself. I'm on the left, this picture is from DevOpsDays Chicago, our mascot is a DevOps Yak. It's not a Chicago bull, it's a yak, and it's awesome. You can come check out the conference. I work for Microsoft on the Azure DevOps team. I come from a developer background, and then, I did a lot of things with DevOps CI/CD and such. I'm not a data scientist, I did some classes on machine learning just so I can get context on this, but I'm coming to this primarily from a developer perspective.

I also run another conference, this is a shameless plug, it's DeliveryConf, it's the first year it's happening, it's going to be in Seattle, Washington, on January 21 and 22. You should register for it right now because it's going to be awesome.

The first thing I want to do is I want to set an agenda. An hour is a long time to be here, so I want to set expectations for what we're in for. I'm going to talk about machine learning and try to define what machine learning really is and what automation for machine learning could look like. Then, we're going to talk about a potential way to implement that automation. Then, I'm going to demo the pipeline and hope that it works.

This slide is here for me so that I remember that I need to trigger my pipeline. I'm going to go in here and make a super meaningful change. We can go back to the slides, because this thing takes a while to run.

What Is MLOps

Let's talk about what is MLOps and why should you care about it. Machine learning is the science of getting computers to act without being explicitly programmed. It's different than traditional programming because in traditional programming we define the algorithm, we state the if/else all of the decision tree and all of that. In machine learning, we don't do that. We let the machines teach themselves about what the algorithm should look like.

This came up first about 50 years ago and so, we call everything artificial intelligence. People think about the Skynet and AI, general intelligence controlling us. Mostly, it's a lot simpler than that, it's narrow intelligence and a subset of that is machine learning, and a subset of that is deep learning. All Deep learning is when we get into things like processing images and identifying them. When I went to college, a lot of people said that computers are never going to be good at identifying images, and now I'm being [inaudible 00:05:15] doing all my pictures.

The machine learning is clearly on the rise, so, I don't have pretty Forrester reports, so this is anecdotal evidence. Machine-learning searches overtake DevOps. Also, if we look at Stack Overflow, we can see that Python just 10 years ago used to be the least popular language of the 5 that are searched for, and now, it's the most popular language and that's primarily because of machine learning. Also I talk to a lot of customers and every single one of them is doing machine learning.

Why Should You Care?

Why should you care? We're talking about a room full of developers and CI/CD professionals, and why should you care about MLOps? Shouldn't data scientist really care about MLOps?

This is a tweet from someone from a different conference, but it says, "The story of enterprise machine learning is, it took me 3 weeks to develop a model and it's been 11 months and it's still not deployed." This is really the case for most of the people that I work with. Their experience is that it takes a really long time to deploy these models. Best-kept secret is that data scientists mostly want to data science?

The thing is people go to school for a really long time and learn how to do these things. "This is deep learning, this is some deep neural network," and "This is bad propagation and I tried to understand it and it was really hard." People really spend a lot of time trying to understand how to make these algorithms better. It's a whole job. It's not a job where it's just putting it in production; it's two separate jobs. Someone actually needs to go and put this in production after we've developed this. For most data scientists, the work environment looks like this. We talk about [inaudible 00:07:31] in our books or we talk about just developing scripts in Python or Scholar, whatever it is. Now I can see that a lot of people actually have source control, which is awesome, but from there to production is a very long way.

The problem that I see most is that data scientists want to do ops but ops don't really understand the challenges of data science and how it's really different. If we talk about how it all works in programming, we develop the algorithm, we give the algorithm data, and then, it gives us answers. When we switch it to machine learning, we actually start with answers in data, and then, we produce an algorithm. Then, this is also called a model and that model, we can give it new data, and then it will going to try to predict the future for us.

If we did that, how would we get into production with this whole thing? For that, I want to dive even deeper into this and say, "Ok, so what is actually an ML model?"

What Is an ML Model?

A lot of times we're, "What is this abstract object that we're talking about?" We've just said that a lot of times ML is very complicated and takes a long time to learn, but it's not always the case. For most people that do ML out there, they're actually doing something fairly simple that can be done with an Excel spreadsheet. We're doing a lot of linear regression. This in particular is, we're looking at housing prices in a particular area over time. I build this equation that allows me to predict, in the future, what the housing price is likely to look like. This is machine learning.

Also, this is machine learning. If I have something more complicated, I can have a vector that's inputted and I can have a vector of an output. Then, if I look at image classification, for an example, this is not the only way that it works but one of the ways that it's broken down, it would just break it down into RGB matrices of pixels. This is a huge array of input numbers that I take in, and then, in the end of it, I just try to say, "Does this picture have a cat or does it not have a cat?"

Basically, a machine learning model is a definition of the mathematical formula with a number of parameters that are learned from the data. Basically, we define a mathematical formula. Now, it comes in some formula, we didn't talk about the formula, but there is a mathematical formula that we got from the model training that we're now trying to deploy. This is great news because usually that means that we know what to do with it. We're, "Ok, we're just going to create an API endpoint. We're going to serve a service and this is the solution to all of our problems and this is going to be great." The only thing is, these models come in different shapes and forms and training takes a long time and all these things. We define the problem, we want to get into this API that we're serving, and that would probably be a good thing.

Do Models Really Change that Often?

When I was talking to someone about this conference, they asked me a question "Do the models really change that often? Do we really need a process for automation of models? Maybe it's ok that it took me 11 months to put in production because it really doesn't change that much."

I just want to give one example. This is a reddit thread with 13,000 responses, so it must've resonated. It says, "Facebook's list of suggested friends is quite literally a list of people I've been avoiding my entire life." This was absolutely true for a couple of months and it was highly annoying, but then it went away. This is a machine learning predicting who my friends are. That model got better, and so are the models that are matching you with Uber drivers and things like that. If Uber, say, made a mistake and published a model that was highly inaccurate and matching with a driver in a different city, you really wouldn't want to wait a couple weeks until it updates that model back. You really want it to be fixed as soon as possible.

The Dataset Matters

The other challenge that we have that I want to talk about is that data set is highly important here. That's just a very different aspect because code is usually light and it's easy to check in and it's easy to version, but then, when we talk about data sets, they can be very large and very unwieldy and very difficult to automate.

I did this as an exercise before, and this is pretty cool – what we know about the world depends on what we know from the data. This is a cat, and this is not a cat. This seems straightforward. Then I can get into, "This is also a cat, and also this is a cat." I need to be able to identify this picture as well as I identify the other one. If I didn't have any examples of a picture like that, then I might not know that about the world. Then this is a cat. This might or might not be a cat, depending on your definition. Also, if you come from a machine-learning perspective – again, I'm looking from pixels that are in the matrix, "Is this a cat?" I don't know. If I've never seen a picture of a fox, maybe I think it is a cat. Also, we can get into weird stuff too.

The point I'm trying to make here is that model predictions will highly depend on what is seen. I cannot talk about versioning the model independent of the data set. I have to have some type of way to check in my data set as part of the model version.

These are just some of the challenges, I'm actually not diving into all them and I know that a lot of speakers today actually talked about other things such as parameters and stuff like that. If we look at big companies, like Uber or Microsoft or Facebook, they actually use a lot of ML in production.

This translation is being run for me by a machine-learning translator. It's being updated frequently and [inaudible 00:14:56] PowerPoint, this is being shipped all the time. What we actually did for this is we built this huge custom process and we implemented a lot of things. We automated the deployment and potentially we have different people in different departments doing the same thing over and over again. Microsoft has this big rule of eating our own dog food, and it works really well for us because we're our own first testers. This is how stuff starts, a lot of people working on this to ensure that that's being automated.

How Do We Iterate?

Most people I know don't work for a big company with thousands of engineers that can invest a lot of time into automating their deployments. We can talk about how we can iterate in a homegrown fashion. I'm going to talk about a little bit more about what the process actually looks like. We want to train the model first. Then again, there's different challenges in training the model, there's different aspects that we need to look at. Then we want to package it, validate that it actually works, probably validate that it works better than the model we already have, then we want to deploy it and we want to monitor it. Models also drift over time, and so, even if my model is highly-accurate today, it might not be accurate tomorrow because the world changed or my data has changed, so I need to actually update it. This whole process, ideally, I want to automate it.

The data scientists and DevOps professionals have some of the different concerns but t also have some of the same concerns. Everybody actually cares about iteration. Everybody wants to get there as soon as possible. Everybody cares about versioning, and versioning, again, is quite harder than code. Everybody cares about reuse. Then, we have some of the different concerns, such as compliance, observability, and such things.

If we talk and compare the high-level process that happens, if we do the application development, then we would just check our code into source control. In this case, we want to actually save our code and our data set and also some metadata as well.

Then, when we talk about building the application, we want to automate the pipeline. We ideally want to automate hyperparameter tuning, and have nice things in the training process itself. Then, we want to test or validate the model, so we want to somehow estimate that its security is good and maybe some other parameters about the model and how valid it is. We want to deploy it into production. Then again, we talked about monitoring and the fact that we want to analyze the performance and we want to potentially retrain the model over time.

How could you potentially do it yourself? Let's say you could use the tools that you already have. If you're not Google or Microsoft, you probably have something that's on this slide already in-house. You probably have GitHub, or Bitbucket, or something, you probably have Jenkins, or Azure DevOps, or something. You probably already have pieces of this, so, for this, for my specific demo, I'm going to be using GitHub for source control. I'm going to be using Azure DevOps, surprise, for automation. I'm going to be using both Kubeflow and Azure ML workspaces to do the model training process.

If I talk about the Kubeflow, basically it's an open-source project that is based on Kubernetes. Anywhere you could deploy Kubernetes, you could deploy Kubeflow. It focuses on doing reusable workflow templates that run on containers. You could develop complicated workflows for your ML training and it will run on containers. Whenever you check in one of these steps, you actually create a new container, package it, and then, you push it into Kubeflow, and then, you run the pipeline using these containers. Essentially, you just need to version the container, and then, you can run the steps.

Then, Azure ML is an Azure-based service, so this one's not open source but it's this build versus buy thing. It actually does the same thing, so it aims to be all the things that you need for automating your ML pipelines. I could've done this particular demo with just Kubeflow or just Azure ML because they both can do the same things. I'm doing both just because I wanted to play with both and see what it looks like primarily. Azure ML allows you to prep the data, train, deploy, version the model, and actually check in datasets, and work with notebooks right there. Again, Kubeflow allows most of the same things as well.


I'm going to do the demo now and we're going to hope that everything is working. I did check in codeband that kicked off the pipeline. I'm right now in Azure DevOps. Azure DevOps is an automation server. It's a SaaS product that allows you to manage all of the steps in your application lifecycle, from project management into CI/CD and all of the things. It actually has a free tier, like a forever-free tier, so you could use it without paying us any money. It also has a free tier for open-source projects, so it's fairly easy to use and enjoy.

Wat I'm looking at here is a YAML pipeline that is a build pipeline for this. I'm going to show you the code that it all starts with. In my case, I'm actually not even doing notebooks right here, I'm actually just doing Python script so I have Python scripts for three different steps. I'm doing pre-process, train, and then deploy. Then, I have a file for the pipeline, so it's a Kubeflow pipeline that will just define the Kubeflow steps.

Whenever I trigger my code – again, there's ways to optimize this, you probably don't want to trigger your model retraining because of a change to a readme, but for the sake of this example, this works. What I'm doing is, I have the Docker files for my containers and t're packaging the scripts. Then, I am checking in these containers into an Azure container registry. Then, after that, my last step in here is just kicking off the Kubeflow pipeline. I am surprised running on Azure because I do work for Microsoft and they give me lots of Azure that I can play with.

I have a couple of things in here, a couple of resources. Again, another challenge that happens with machine learning is that you need a lot of compute to run it on, so there's only so much you can get away with for free. Some of this stuff can be run for free, so if you just wanted to deploy it for the sake of the example, you could do it for free with Azure trial account. I have a Kubernetes cluster that's running Kubeflow, and then, I have a Kubernetes cluster that's actually going to run my model in the end of it. Then, I have the ML workspace over here that I'm actually checking my models into and all of these things.

Then, the last thing I wanted to show you is that GitHub repository. It actually has all of the code, so it has the code for the models, it has the scripts, it has the Kubeflow pipeline, and it has the Azure DevOps pipelines, so you can actually come here and take all of this and build it all together. It has the steps together for deployment on Azure, but you could deploy this pretty much anywhere you had a Kubernetes cluster. In Azure ML, you can only deploy in Azure, but it's a SaaS product again so you can just use it, you don't have to do much about it. You can go to this link – I have it in resource slides – and, if you have a couple days, go and build this yourself.

I did run the pipeline and it did build-and-deploy steps and it triggered my Kubeflow pipeline. I do have a Kubeflow pipeline and it's very simple. Again, in this case, I'm not doing anything complicated. The problem with complicated is twofold. One is I have to understand the workflow, and two is it has to run sufficiently fast for me to be able to demo it, and that doesn't happen. I wanted to show some cool things like GitHub, open-source, a bunch of data on code, and so, you can now learn predictive coding and like code analytics and stuff like that, but it literally takes days to run. I don't think I want to attempt that on stage.

This is a very simple Kubeflow pipeline. Actually, what it tries to do is tell pictures of tacos from the pictures of burritos. It's a very important problem that we all need to solve. It does the pre-processing, the training, and registering. The register step just goes into Azure ML and registers the pipeline, and then, it registers the model. Then, I can deploy it as a service using Azure ML. Again, I could potentially do this with Kubeflow as well.

Then, the other interesting thing is that, if we already did one of the steps – pre-processing, for instance, takes a really long time, so we processed all of the data and all of that stuff, so if we already did this process and already learned from it, we don't have to do it again. Kubeflow will just skip that step because it's already run. That allows us to optimize our steps.

Then we come into here, which is the last step of this. It has a release pipeline and all it does is, it does some pretty straightforward scripts. Actually, let me show you something else. I'm going to show you the test process now that it's more informative. I'm using Azure ML which allows me to just run some Azure like commands and say, "Go deploy this model." In this case, I'm deploying this model into Azure Container Instances, which is my test environment. Then, I'm just throwing some images at it and I'm saying, "Is this a burrito image?" and it actually tells me that it is a burrito image but it's really not sure that it is. It's very low in confidence. In this case, I'm not even actually doing anything with this information, so I'm not actually looking at the accuracy, but I could go and start looking at the parameters of the model and try to estimate if it's good before I'm putting it in production.

Then, the other thing that I'm doing after this is, I'm deploying it to, "production environment". This one actually has an approval check. I'm going to approve it. This is actually true of pretty much every production deployment that I see out there, most of the people don't deploy to production without having some human look at the process, so it's not only for ML models. Definite for ML models, I would say that, at least from what I've seen, there's not a lot of automated testing that you could just rely on without having human validation that this stuff actually looks good before it's going to production.

Then, this pipeline could be further improved next time I have free time. We could actually look at accuracy and we could actually look at model drift and we can actually go back into retraining and stuff like that. It took me, let's say, a week to build this, and I did have examples from other people, so this takes time. It took me a week to build this, but now that I want to put a new model into production, I can do this in the span of an hour, instead of 11 months, and that's a huge improvement. It probably won't cost you too much money to do this and definitely will save you a lot of hours if you are doing this, comparative to trying to do it all on your own.

The other thing I don't know if I mentioned is, the tools such as Kubeflow and Azure ML allow you to take models that come in different formats, so you might be using TensorFlow or something else, and take these models and just deploy them as a service. You could also do it on your own, you have to write some code to be able to do it but it's also possible. Again, I like tools that do stuff for me, so...

Even a Simple CI/CD Pipeline Is Better Than None

If you take only one thing away from this talk is this one: even a simple CI/CD pipeline is better than no pipeline. I know that if you're working with a team and data scientists that are basically just used to writing code on their keyboards and doing this stuff manually, it's going to be a change of mindset to get them to do the pipelines but it is definitely worth it in the long run. Also, don't try to ask them to deploy Kubernetes, I don't think that's going to go very well.

Change is the only constant in life and that's why I've been organizing DevOpsDays for six years, because I believe that automation actually helps us eliminate issues.

AI Ethics

I want to finish on a particularly note and that is AI ethics. One thing that I learned when I was learning more about this stuff is that bias is actually a property of information, it's not a property of humans. If you're building algorithms and if you've given them a particular set of data, they're going to learn about the state of the world from that data. There's some examples already in the industry, all the algorithms being biased against certain subsets of population.

One is the racial example. These algorithms are actually being used by judges and by the police to identify if you're likely to commit a crime or something like that. That definitely has some racial biases in there. The other example was, the ads that ran for CEO jobs were displayed only to males, because guess what, the data set suggested that only males can be CEOs. This is complicated stuff and this is something we must think about when we develop AI because these algorithms are actually deciding your next mortgage and they're deciding where your kid goes to school and credit score and stuff like that. You just definitely want to think about this when you're working on ML. So, build AI responsibly.

Questions and Answers

Participant 1: I just had a question about your pipeline, I saw that you ran the model on ACI, Azure Container Instances. Does that support GPUs?

Rosenbaum: I don't remember if ACI does, but AKS definitely does. My production instances does the GPU. I actually would have to go look if ACI does as well.

Participant 1: We have a similar pipeline that we've implemented, not using the Azure DevOps but using ACI. We have to keep a VM up at all times in order to run our model.

Rosenbaum: Yes. You can definitely go in through the AKS stuff, that definitely supports GPUs.

Participant 2: How do you monitor your deployment and when do you decide that you need to redeploy?

Rosenbaum: I think this is more of data-science question. The monitor and deployment in terms of, it returns 200 ok, that is easy to do, but knowing that you have model drift probably is harder. I'm probably not the best person to answer that question.

Participant 2: But you probably have such a mechanism in place.

Rosenbaum: Yes. Internally, at Microsoft, yes, and I would have to find out what it is.

Participant 3: Do you have any advice for versioning the data once the data is too big to put in Git, which is usually pretty soon?

Rosenbaum: In terms of Azure ML, you can actually commit the data set to Azure ML and it will actually have a version attached to it. Also, tags the models of other versions. For Kubeflow, you have to find some type of storage. I wouldn't put it in Git, it's always too big to put in Git. I would put it probably somewhere in storage that is relatively cheap. Then, you do have to solve the problem of how you version it.

Participant 4: One of our data scientists was thinking about iterating on their CNN using different branches on Git and doing multiple deploys. Have you had experience with that? Do you have any recommendations around how do you experiment and iterate on that very quickly?

Rosenbaum: What we do at Microsoft is we control the flow through the pipeline, not through branches. I see a lot of people out there doing branches because it's easier. You're attached to a branch to the environment and it allows you to isolate stuff. I would say it's still a valid way to do this and it might make it easier for you to have multiple versions at the same time. The other way is through the pipeline, so the pipeline knows which environment you're deploying to, and then, that means that you're putting the same code through the whole process. Whatever it was in your dev and you looked at is the same thing that happens to be in your production.

Participant 5: What if your data set is too big to be checked in to some version control?

Rosenbaum: I think the gentleman asked that question. I definitely wouldn't check my data set into source control. Again, either you put it in storage, so some Blob storage somewhere that is cheap, because these data sets can be very large, but then, if you do this in storage, you have to find a way to version it and know that this data set is what trained that model so you know if you're changing something, that it propagates all the way.

Participant 6: Spinning off the versioning and how it relates to software CI/CD, have you had an incident where the checks for the model weren't sufficient so you actually released bad data? What is the equivalent of a rollback and as far as purging the bad data, does that take a while?

Rosenbaum: The equivalent of a rollback, I would say, just go to previous version. In case of Azure ML, you can actually go and deploy any version so that makes it easier for you. I actually don't know, in terms of Kubeflow, how easy it would be to rollback. You might actually need to go and retrain the model, which does take a lot of time. Then again, the challenge is that you have to know that the data was the same data. I see the same thing for code people roll forward rather than roll back a lot.

For the second question, if you have the way to have a tagged version model, then you can deploy it fairly quickly. Some of these models can be trained for hours or days, and so, this can take a while. Versioning is important.

Participant 7: We had questions about the data sets, my question is about the models. How do you keep them version-controlled when they become too big? We have three things, the code itself, the models, and data sets. Can you elaborate a little bit more on the version control and the fact that they get too big to be checked into the GitHub?

Rosenbaum: In Azure ML, the model versioning is just part of the thing. This is actually one of the cool things I think. I don't actually have to put it in version control because I have this hub and it keeps my versions for me and I could actually click on this and deploy it into production environment. If you're not doing that, then you have to figure out a way. You could put it in storage, but you have to automate the process of keeping them there and knowing which version it was.

Participant 8: Can you repeat the first step that you have there, which is profile the model, what does it do?

Rosenbaum: Profiling the model. This is specifically a command that I can run on Azure ML, and what it does is, it identifies the best compute cluster to run it on. Essentially what it does it's going to tell me what CPU or GPU I need and what memory. I'm actually faking this because it takes like an hour to run. If you are running actual production models, it can be very very useful to use, you actually know the best compute clusters to run it on.


See more presentations with transcripts


Recorded at:

Jan 27, 2020