Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Iterating on Models on Operating ML

Iterating on Models on Operating ML



Monte Zweben and Roland Meertens discuss the challenges in building, maintaining, and operating machine learning models.


Monte Zweben is CEO and co-founder of Splice Machine @splicemachine. Roland Meertens is Product Manager @annotell.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.


Jördening: We want to talk about iterating on ML models and deploying them. I'm Jendrik. I work here at Nooxit, being the CTO, doing a lot of ML Ops and deploying lots of machine learning into production.

Meertens: I'm Roland. I'm currently Product Manager at Annotell. I really like robots and machine learning. The problems I'm thinking about on a daily basis are, how do you collect data to improve your model? How do you train your model? How do you validate that it's correct?

Zweben: I'm Monte. I'm the co-founder and CEO of Splice Machine. We make our machine learning platform for real time AI, and specifically produce components like feature stores in ML Ops and experimentation platforms for data scientists, data engineers, and machine learning engineers, and deliver industrial solutions on top of that set of tools for outage avoidance, and asset performance, and anomaly detection for industrial clients.

Experience Deploying ML

Jördening: What was your experience deploying ML in the beginning? Roland, for you, it's probably nowadays, and Monte for you it's then why did you jump into more the ML Ops part, or in your case the feature stores part?

Meertens: I think there's this famous paper by Google, I saw Francesca also referencing it, where you have hidden technical depth in machine learning systems. They have this picture where you basically show the amount of machine learning code and the amount of all the other code. Basically, I think the machine learning algorithm is only a tiny thing, eventually, of the whole machine learning infrastructure. I think another famous example of someone who confirms this is Andrej Karpathy, who is now the lead computer vision engineer at Tesla, where he says, yes, when I was a PhD student, I cared a lot of my time about machine learning algorithms. Now that I'm at Tesla, 90% of my time, I care about data and ML Ops, and bringing this model into production. I think that ML Ops is really one of the most important parts of the machine learning algorithm. It's really how you go from the knowledge of how you can build an algorithm to actually making customers happy and verifying that everything is correct. That's why I like it so much, or that's why I moved more into that direction.

Zweben: I had a very similar experience, two companies that I worked at had an experience that was very similar. It's not just the amount of code. It's also the amount of people and the amount of time. The same thing goes for the amount of people that you need to hire for essentially assembling the pipelines into the models and pushing the models back out into the applications. You have a lot of people that are dedicated to this. From just a pure business sense, the P&Ls of companies that do machine learning, have a tremendous expense on the number of people. Then, if you look at the amount of time spent in operational machine learning environments, this amount of time is spent on experimentation, and this amount of time is spent on feature engineering and model Ops, and everything else.

At Blue Martini, before the big data architecture was available, back in the old days of machine learning where I had started commercializing machine learning, this was all about getting the data from my transactional Oracle systems to Oracle analytical systems. Just making that the foundation for the machine learning later, when I was chairman and then CEO for Rocket Fuel. We had a 20 petabyte HBase architecture for serving models, and a tremendous Hadoop cluster for essentially the underpinnings of the features and for the training environments. It just was so many people. It was hundreds of people. The reason why I started Splice Machine was realizing that what we did at Rocket Fuel couldn't be the way that your arbitrary insurance company, or your manufacturer, or your government agency could do it anymore. It was just not cost effective. We focused on this problem to just bring down those burdens in terms of amount of code, amount of time, and amount of people.

Jördening: I think it's similar to what we had in the beginning with websites. Everyone just wanted to have a small text and an image on the website. Then it was like, please find a contractor who builds me a web page, hosts it, and everything. Nowadays, you go to Webflow, or another provider, and you're like, here is my text, and please put it in a nice template, and you're done. I found that as well very interesting when Francesca showed the ML engineer, the data scientist, and the data engineer. Ideally, you don't want a data engineer and an ML engineer for all your projects. You want someone managing your infrastructure, and then someone writing the actual interesting code.

Groundhog Day in Machine Learning

Zweben: I think, if I can add a comment to that, that builds upon that role based interpretation. There's also a metaphor, back to the old web days that we're experiencing, again, like a Groundhog Day in machine learning. I remember back when anything that had to do with the web, in a company, would go through this group of 10 people that they were the internet guys or girls. They were the people that anything that had to do with the internet went through. If you needed to change something on a website, or do anything that had to do with the web, it went through this centralized group. Of course, that was a bottleneck. It was crazy. Now organizations have web presences and mobile apps, all through the enterprise, all different organizations have their properties with which they interact with customers, or suppliers, or what have you.

The same thing is happening to us in data science. If you go to a large telecommunications provider, or you go to a large insurance company, or a large bank, or a large healthcare provider, and you start talking about machine learning, you get pointed to the data science group. There's one of them, and there's this one big team, generally speaking. We're seeing this centralization and silo of data scientists, data engineers, and machine learning engineers, because of this complexity. If we really want to see the breakout of machine learning into the enterprise, not only do you want to reduce the amount of data engineering and machine learning engineering that you need for every production model, like you said, Jendrik. You've got to democratize this group, so that there are data scientists in each part of the enterprise that can actually facilitate this process. We're not even close to being there yet. Hopefully, conferences like this and technologies that we're all working on, facilitate that democratization. That's where we've got to get to.

Programming 1.0 to Programming 2.0

Meertens: Maybe to expand on that. I think if we're talking about this change from, you have a company with a small web team. I think nowadays, way more companies are web companies than they maybe even realize themselves. If you talk about, for example, your bank. Your bank is not just a place where you get money. It's really a web app. That's the most important thing. What I'm personally in terms of machine learning excited about is this whole idea of programming 2.0, where right now you program by just implementing the algorithm. For the really difficult algorithms, you program by example, where you say, if you see this image, or if this image is the input, then you have to do that. This is something I'm really excited about is how do you go from this programming 1.0, where people manually construct features to this programming 2.0 where maybe everybody in your company is responsible for labeling things and indicating that with this input that should happen, with that input that should happen.

Zweben: That crowdsourcing of labeling, big topic. That's a really good one.

Meertens: Maybe not about crowdsourcing, maybe it should also be seen as a more intelligent thing. Because right now, people often go to Amazon Mechanical Turk or something. In my case, I work at an annotation company, so people come to us. You should not see it as a dumb problem, you should see it as a smart problem. You should really think about, if I get this image, what should the output be? In that sense, it's more about you're employing programmers at that moment, rather than someone just giving you a dumb input about what the thing is.

Zweben: That relates a little bit to a research topic, and you're on the bleeding edge of deploying technology. Research-wise, when we start to see the ability of incorporating more symbolic approaches to machine learning that were quite popular before deep learning was popular. They all emerged at the same time. You have more symbolic representations of human knowledge that come from experts. Then we have all these machine learning techniques that are much more induction oriented, rather than deduction oriented, and they infer from lots of data. We've got to get to the point where we can blend these. Where just like you're saying, it would be great if it were intentional to map an image to some behavior. In general, if we can seed our models with knowledge that we already know, that's going to be really powerful. That's PhD material, I think now.

Feature Stores

Meertens: We want to be at some point, like feature stores. Are there feature stores where you just encode knowledge about what certain jobs are? I think nowadays, people already use pre-training or transfer learning. Maybe later, you will have really big hubs where you can just find different pre-trained models, where you say, "I have a problem which requires knowledge about what's going on on the roads. Let me use this pre-trained model." "I have a model which is working with faces for my new dating app, or something, let me use this pre-trained model." You really have feature stores with already preprogrammed programs you can use.

Zweben: One of the thing that we tried to design into our feature store, not to plug our feature store, but in general for feature stores, there's no reason why you can't think of features in a feature store as models. There's a composition that needs to take place, so that you build smaller models, like lifetime value of a customer. If you know a customer's lifetime value, which may be predicted by a model, shouldn't that be a feature to something that's recommending to that customer? Those are the kinds of things that I think you'll start to see where feature stores or model stores, to your point, and they become quite robust.

Do Not Repeat Yourself Patterns in ML

Jördening: It's super important as well, because I think what we're currently lacking in ML is a lot, but do not repeat yourself pattern. I don't know how many people wrote something to encode dates into day of week. Literally, half of the data scientists did it by now, and actually only one person should have done it, tested it, and you would have been done.

Zweben: Then you get subtle examples of the problem. I'll use a retail example just because that's accessible for a lot of our audience here. If you're two data scientists that are building models, one may be trying to acquire new customers and another one trying to retain customers, and they're collecting up sales data or just aggregation data. One data scientist is collecting up summaries of behavior and they include sales tax, and another one doesn't include sales tax. They're both thinking that they're using some summaries, all of a sudden, they may actually end up doing different things because they define their feature with colloquial, and in English, they may call it the same thing, sales history, but they've got a different definition of it that's slightly different and their models behave differently. It's an interesting problem of just being able to just start to reuse this work, and not have it be just duplicated over again, for sure.

ML Engineering and ML Ops as Part of Computer Science

Jördening: Do you believe that ML engineering and ML Ops should be part of the course CS material for upcoming CS ML students?

I definitely think so, because you can't teach training models without teaching how can you compare your model to the student next to you.

Zweben: I agree. My colleague Ben Epstein and I, organized a course at WashU in St. Louis, Washington University. He did most of the teaching. We put together this full credit course that was all about cloud computing, big data, and ML. It strung together some of these concepts of just everything from the beginning of like, how do you create a compute instance that's elastic, all the way through to, how do you push a model out? I'm hopeful that the curriculums for computer science start to do this. I'm also on the advisory board for the dean of computer science at Carnegie Mellon University. This is emerging. This question that was asked, is really emerging as a standard, where practical engineering standards beyond just ML, but just in general, become part of the curriculum for traditional computer science. Yes, you'll learn about algorithms, and data structures, and compilers, and operating systems, and databases, but you also learn the software engineering techniques to put these all together. The DevOps techniques on top of that, and that proves true for machine learning too.

Meertens: I think we can even go a step further, I think, it should be taught at primary school. There's this thing from Google, which is called Teachable Machine, which I think is a great tool to explain to children, what's the basis of machine learning? You show some examples of a cat, you show some examples of a dog with Teachable Machine data with your webcam. I use this as my way of doing machine learning. If I think, I want to solve this simple problem. I just go to Teachable Machine and train my model there. It's a great way to explain to children already, what is this programming 2.0. I really hope that easy tools like this, if they are being used by more children, are maybe integrated into other programming tools for children, like Scratch, the programming language where you just draw boxes.

Jördening: I know it as LabVIEW.

Zweben: Scratch also. Also, there's the Robot. There's a bunch of these different simple visual programming languages that different high school programs and even elementary school programs are adopting. I think you're absolutely right.

Meertens: If we're talking about the future of ML Ops, and how do you work with machine learning models, I think that's the dream. That even like a primary school child can go and show some examples using the webcam of their laptop, or their iPad, whatever, and they can train a simple model on that, and they deploy it. Like no code required. I think that's also what Francesca talked about with, you should eventually have a no-code deployment for your model. I think that should be the dream. That should be the future. You can easily create machine learning models for whatever you want.

Zweben: I think that is the dream, although I think the days of not doing feature engineering for many domains, signal processing, I think we're getting pretty good at being able to do some transfer learning and some things like that. For real deep domains, I think there's still going to be a human data scientist in a loop. The secret isn't going to be which algorithm to use, it's still going to be in feature engineering and the representation of the features. That's a religious conversation and a debate to have in itself. I agree with you, that's our dream is to make it more turnkey.


Meertens: Maybe one thing, if we're just continuing about dreams, one thing I really like is GPT-3. If you have access to a GPT-3, it is pretty simple to make any natural language processing application. I think the GPT-3 moment should still come for image recognition, for image based data. I think that's where we're heading where you just have a couple of pre-trained features, you just have your feature store, and you can expand on that.

Zweben: GPT-3 I think is just the beginning for us to learning transfer learning. I think that's the secret. If we can really start to see transfer, it's going to break machine learning into an entirely different discipline. It's really exciting. Albeit, not everybody has the resources to put together that robust of a deep learning network but it's definitely really interesting.

Transfer Learning in Audio

Jördening: The transfer learning started in images. First, we started using ImageNet as a pre-trained network for everything. Now we do it in language. I'm actually interested in when it will occur in audio. It will be otherwise a repeated task as well. Who wants to convert audio to text? No, it's like we want to build something that can use audio to make something meaningful? Text or the instructions are just the layer in between that everyone has to solve.

Zweben: I think that some of the Amazon tools for spoken language understanding are the beginnings of some of that. I agree with you. I think that the signal processing domains will be the first ones to benefit from transfer learning, because of the embeddings that emerge in some of these new architectures. The autoencoder architecture, that really has opened up a lot of, I think opportunity for us too. Roland, do you deploy a lot of autoencoders in the autonomous vehicle world?


Meertens: In terms of autoencoders, what's really helpful there is being able to construct features out of unannotated data. For autonomous vehicles, that is the big problem that you often have a lot of data. If you think about Tesla, for example, they have a big fleet, but annotating all the data is too expensive, so you have to choose what to annotate. I think autoencoders at least allow you to already find some distribution of latent features, which make it a bit easier to choose what to annotate.

Jördening: In general, autoencoders are just the task of understanding something. Most babies, when they start growing up, the first thing they do is just recognize the world and try to make sense of it. That's what an autoencoder does. Even we as humans, before we start thinking about anything or connecting anything together, the first thing we try to do is basically just make sense of our sensors.

Zweben: What I find so remarkable, is that you read a lot of our deep learning papers that are coming from the research community, so much of the work right now is I tweaked it with this parameter, I have this many hidden units. It's still very trial and error in our deployment. That's what's exciting about the space that we're really just scratching the surface of turning this into a real science. It's going to be really exciting to see what these techniques allow us to do, all of the language work. I used to do computational linguistics, a long time ago, and that was all based on deep linguistic theories that would break sounds into tokens and words into subject and object and action triples, and then you would break that into contextual discourse elements. It was all done symbolically. Now there isn't a single computational linguistics project in the world that does it that way. It all does it with BERT, or some other autoencoding type of approach. It's a remarkable change in an entire science now.

The Maturity Adoption Roadmap for No-Code for ML

Jördening: Where do you think that no code for ML is on the maturity adoption roadmap? Because that would be the next step. You don't need to know linguistic theory. You don't need to know programming. It's more what Roland said. It's like, you have something, you show it some examples, tell it what it is. Then it's good to go.

Meertens: I think it's really in the very early maturity. If we go back to the great example of you Monte about web development, nowadays, if you want to make a website, you just use WordPress, or one of the many website creators YouTubers recommend whenever you watch a YouTube video. There will be something like that for machine learning at some point as well, where YouTubers will recommend if you want to make a program, use this, use that, just making it really easy for everyday business to use machine learning. Right now, I think Teachable Machine, which is still for children, is one of the few ones out there. Maybe another thing which is out there is Google's AutoML. Google's AutoML also does a pretty good job at building and tweaking models, given data, but it's already way more technical.

Zweben: I think that time is the problem that will make it difficult to automate all of machine learning with AutoML techniques. Here's what I mean by that. When you have domains where you can take a snapshot of data, and then build a prediction on it, and it just is as simple as that, I think AutoML is going to become very feasible. When you need to look at the effect of time on a domain, and figure out how to translate that into features, I think it gets more complicated, and you need a data scientist. As an example, we talked a little bit about e-commerce examples and shopping. It's hard to just take people's shopping carts and just figure out exactly what to recommend to them. You have to look at what they browsed before and what they bought before. That's behavior over time. Somehow you have to take those behaviors over time, and summarize them through aggregation, like, how frequently did you purchase this category or this brand? When was the last time you purchased this category, or brand, or this product? What was your average purchase price? That becomes recency, frequency, and monetary value aggregations that analysts who study sales history know how to do. Trying to derive those kinds of transformations on the fly automatically is going to be hard.

Maybe another good example is, you're in a factory. You're trying to predict why some turbine is going to fail. Upstream in the factory, maybe somebody increased the pressure in something five minutes ago, but that delay and time of an action that was taken a little while ago, has some causal impact on something that's happening now. It's really hard for systems to discover those temporal relationships. That's why I think AutoML, like we've been talking about is a few years off, maybe 10 years off in many domains, but some domains where you don't have that temporal causation might be applicable for that today.

Signal vs. Noise

Jördening: If those tools get available, won't there be too many noisy models out there? Would you still need ML engineers and data scientists to interpret the results to make sure the models are doing right things?

I think actually, that's a good analogy to the internet nowadays. Some people just started deploying smart TVs, and I think someone from Netflix mentioned that they still support 10 year old TVs, which just have HTTP2, because never, ever did anyone thought about building an update thing in there, because it would have been just too expensive. You can see it on the botnets, we have a lot of noisy devices out there and nobody really takes care about what they do. I actually assume we will have something similar happen with machine learning at one point. It won't probably be so problematic, because they can't talk to each other like nowadays, internet devices.

Meertens: I think it can actually be very problematic. If you just look at the problem of self-driving cars, the world around us changes. I remember the day that they introduced these electric scooters, where, for me, when I was working at the self-driving car company, suddenly tiny objects were going 30 kilometers an hour. We didn't expect that. We had no training data. If you just think about the old fashioned way of deploying software, you deploy it in the real world, and there are no updates. With machine learning, in this case, that's a dangerous situation, because your entire world changes on exactly the day that Germany allows these regulations with fast moving companies. I think in that sense, there will still be a lot of need for machine learning engineers and data scientists to think about, how do we prevent these problems? I also think there's a big need for knowing the mathematics behind the data science that you know, what could cause this problem? How can we prevent it? How do you do versioning of your machine learning algorithms? Things change over time. You maybe even have feedback effects, where if you have a model which is detecting specific things, and then you use that again to select more data, then you just have this runaway effect at some point. As long as the world keeps changing, and is dynamic, I think you have to keep updating your models, and you have to think really well about what you do about them, before you deploy them.

Zweben: I think there's a metaphor for this in another area of computer science that I was exposed to early in my career, which was the planning and scheduling world, where you do AI systems for planning and scheduling. This was true in military domains and space domains. I worked for NASA for a while. The project that I had was preparing the space shuttle for launch and scheduling all the operations to make that happen. What we realized is that everyone in the computer science community was so focused on scheduling algorithms. Basically, these would be using optimization algorithms. In the AI world, we use heuristic search and operations research. They would use mixed integer linear programming algorithms to do this. Everyone would write papers about the best plan. What was true is that as soon as you come up with a plan, and you start the first step to execute it, the plan is obsolete, because everything in the real world changed.

The real problem was re-planning and rescheduling. That's what we focused on, was, how can we iterate really quickly, and be able to take an existing plan and patch it up a little? Not throw it all away and come up with an entirely new plan that was optimal, but just patch it up. I think that metaphor holds for us in machine learning. We're all spending so much time on coming up with good models. The point that Roland you said is so true, that as soon as we put a model into execution, the world with which it is revolving has changed. Now that model is going to need to change. Becoming good at monitoring and getting good at patching these models in ways that they get incrementally better to adapt to their changing environments is going to be where I think most data scientists and machine learning engineers end up spending their time rather than endlessly experimenting to come up with that first model.

ML for Critical Systems

Jördening: I think as well, we should probably see ML a bit more as a communicative task as well, because a lot of those problems would be solved if you just talk to the people who work there. On the launch pad of the space shuttle, once it's like, I make you the perfect plan for one hour, he will tell you, "I bet it holds true for two minutes." You could have saved so much work, because they would have just told you from their knowledge, it's impossible. I think that's a lot of things we can take away from there for data science as well. As well with the question regarding critical systems, I think all of us can agree that if you go to aerospace engineering, or biomechanical engineering, and put ML in there, it should land under the rules of critical systems, because it is critical. It can affect lives.

Meertens: Also, the way people work with it, you have to be really aware of what you're doing. Talking about biomechanical engineering, one really recent example is that someone made a COVID dataset with X-rays of lungs. What they did is they took COVID patients and made an X-ray of their lungs. They also had a base dataset of people who didn't have COVID. The base dataset was actually a dataset of X-ray lungs of children. The only thing the model had to do was to determine if there's a lung of a grownup person or a child? This is this whole data explainability, where you have to be a really good machine learning engineer, you have to be really aware of what is in your data. What am I actually learning? Can I see what my model learns? I think that's a really critical system, in this case. If you try to deploy that, it's really hard to notice what's going wrong for like a layman person or a doctor.

Zweben: Experimenting is really important.

Jördening: If you as the data scientist, don't talk to a doctor, you would probably not see what's going on there, either. If it's normalized, I would probably not be able to distinguish a child's lung from an adult's lung, at least if it's normalized to whatever, 128 by 128 pixels.

Meertens: Especially if you only look at the recall and precision. In this case, you'd probably say, precision, one point, did a good job today.

Zweben: It really is a great example.

The Validation Processes in ML and AI

On another topic, I'd love to hear about your validation processes from a driverless vehicle perspective, because there's a lot of press out there that's emotional. The Doomsday press, "We can't put AI systems into critical systems, because they'll take over the world, they'll do bad things. People will die." One of the interesting things that I see out there is the ability to essentially validate systems the same way we validate human experts. We let humans fly airplanes. We let humans drive cars. It's extraordinarily impossible to prove that a human won't make a mistake. Yet we let doctors perform surgery. We let people drive us around, because there's some level of certification that we've done that says, ok, they have an MD. They passed this exam. They did this much work in the real world, so we believe they're competent. Yet in the press, we say AI models, we need another level of competency in validation. What are your perspectives on, when can you declare a model as good enough to be safe for the real world?


See more presentations with transcripts


Recorded at:

Apr 23, 2022