InfoQ Homepage Presentations Unified MLOps: Feature Stores and Model Deployment

Unified MLOps: Feature Stores and Model Deployment

Bookmarks

View Presentation

Speed:

Download

39:09

Summary

Monte Zweben proposes a whole new approach to MLOps that allows to scale models without increasing latency by merging a database, a feature store, and machine learning.

Bio

Monte Zweben is the CEO and co-founder of Splice Machine, a provider of real-time machine learning and AI solutions, where he leads the team in their mission to make operational, real-time AI possible for their customers.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Zweben: I'm Monte Zweben, co-founder and CEO of Splice Machine. I'm here to talk about Unified MLOps, bringing together feature stores and a unique approach to machine learning model deployment. Why do you need a feature store? Once companies get started with machine learning, it's usually pretty easy to get one or two models deployed into production. Then once you try to get tens and even hundreds or even thousands of models deployed, you hit a wall. It's too hard. It takes too many people. It takes too much coding. It's really hard to govern all of those models. Companies like Uber and Airbnb built a bespoke mechanism to allow them to blow through this wall. Really make it so that you can deploy hundreds or even thousands of models, and govern them carefully and reliably, and in a performant fashion.

Why It Is Hard

Why is this so hard? This is hard because, first, the productivity of data scientists is really limited, because feature engineering consumes everyone. It takes so much of the data scientist's time, and even their corresponding data engineers and machine learning engineer's times to make production data pipelines to develop the features they need for their models. Two, there is a real decline in predictive accuracy, as you put more models into production, because training and serving data pipelines often become inconsistent with each other, and inconsistent in time. That leads to wrong predictions, and therefore, really poor decisions for the enterprise. Three, it is model governance. We're constantly trying to answer the question, why did the model do that?

How Do You Use a Feature Store?

Feature stores solve these problems by essentially providing four main capabilities. First, they allow you to reuse features to discover the work that's been done by colleagues, that's been annotated with metadata, search for that, and then reuse that feature for your own models. Second, a feature store lets you create training sets with extremely low amounts of code, and do it repeatedly and reliably at any point in the data science process. Third, feature stores can provide a real time serving of features for real time models. This is often required in milliseconds for interactions with customers that might be on the web or on mobile devices. Then lastly, a feature store provides the transparency necessary through end to end lineage to really explain why a model did what it did.

Feature Reuse

Let's start with feature reuse, the first function. Without a feature store, data scientists get into their Python environments usually in notebooks, and build their features in their feature pipelines, and then they hand them off to engineers to translate their experimental work into robust SQL or Spark pipelines. With a feature store, you don't have that double work. Data scientists can easily define production ready features. Also, without a feature store, you find data scientists throughout the organization duplicating code, building features that are very similar to one another, and using redundant infrastructure to do that. With a feature store, you can actually provide an API that allows them to reuse each other's code. This is as easy as searching for some keywords or some tags, finding the response. Clicking on the features of interest. Looking at the distribution of values for that feature to confirm if this is indeed something that might be useful for your model. Then being able to incorporate that feature directly into your models and your experiments.

Also, the reuse of features that are more complicated than just a simple ordinal value, or some string. Let's think about features that are actually complex aggregation pipelines. Perhaps recency, frequency, and monetary value pipelines, where you're summarizing a sales history based on how frequently somebody buys, or how recently they purchased, or what their average spend is. With a feature store, you can build with very little code, repeatable aggregations that bucket summaries of customer behavior, or any aggregations, and have that be reused and kept fresh, without having to write that code yourself.

Create Training Set

Creating training sets is really hard. Why? Because typically, there's hundreds of lines of complex SQL that is error prone, that connects up the different sources of data for the features and the labels correctly. With a feature store, it's as simple as just specifying the join keys for the features, and to bring them together. You'll see this in operation in a moment. Also, features may be lost. It makes it impossible to build training sets in the future. With a feature store, a versioned time series history is kept for every feature in the feature store automatically. There's no need to have to write all of that code to maintain the feature history to develop training sets later on.

As an example, here is the code that's necessary to create a training set in the Splice Machine feature store. You simply specify a view of what a training set would look like. You do that by specifying some SQL for the label, and also some primary keys. You also specify a little bit of information as to where the timestamp column is. Then you can actually create a training set, either as a SQL statement that can generate training examples, or as a Spark DataFrame. This is very little code to create a training set from a set of features that's in the feature store. One of the most difficult things about training models, is making sure the data that goes into the model, the features that go into the model are point-in-time correct. What this means is that you have to align each feature, so that the observation of a training example has the right values of the features that existed at the time of the training example, at the time of that event. This is a subtle idea here but it requires a lot of coding, that's almost always left to the data science to do in a bespoke manner.

For example, in an eCommerce world, you might have three simple features. How many searches did a customer do? How many items are in their shopping cart? How many products were viewed? Over time, during the session, there are a number of predictions that are made that end up being recommendations that are provided to the customer. Here, at this time point, when the first recommendation is provided, the prediction that the model is going to come up with requires the values of the features right at this time. It's 1 for the nouns of search, 0 for the items in the cart, and 1 for the number of products that had been viewed at this time point. This is really important, because you want to remember this example for retraining later on. Here at this other time point, obviously, the feature values have changed here, at least for two of the features, and here they've all changed.

If you don't have a feature store, you have to write your own code to remember the values of the features at every prediction that was made. To be able to align that with any other data you're bringing together to train a model, align that with the timestamps on a label definition, align that with other features that might be coming in. This proves to be difficult. What often happens is that there's something called data leakage, where sometimes the wrong feature value gets into the training example. It would be like if we said here, this was 2 and then 0, and 1 for this first training example. That would be bad because during this point in time, the prediction didn't have a 2. The training example would have been a poor training example. It would have been describing something that didn't happen in the real world. That's what leads to poor accuracy in models. Feature stores fixes problem, because they take care of that automatically. They align every feature up in time.

Serving Features

Serving features is a critical element, especially in real time machine learning applications. Typically, without a feature store, the team needs to build a bespoke pipeline that feeds a key-value store typically used to create that low latency record lookup of a feature vector. With a feature store, you can actually use the same representation of features for both training and serving. There's also an interesting element that takes place, which is that there's an inevitable inconsistency that happens when you have separate databases for maintaining training set information, and feature serving information. We'll dig into this architecturally. With the Splice Machine feature store, you provide an ACID compliant Compute Engine, that can actually make sure that the features used to construct training sets are always consistent with the features that are live, ready to be served. We'll see how we do that when we dig into what's under the hood. Feature serving from a feature store is as simple as calling get feature vector, and getting yourself either the SQL to execute that feature retrieval, or to get a DataFrame back for a model pipeline in Spark. All of the APIs in the Splice Machine feature store allow you to essentially return either a Spark DataFrame or a SQL statement to be executed.

End-to-End Lineage

End-to-end lineage, and how do we govern our models? Without a feature store, it's really hard to find the training set that was used for a model. You've got to do a lot of housekeeping to keep track of everything in the experimental world. With a feature store, you can actually retrieve a feature set that was used to construct a training set, and do it repeatedly and know that the same exact set of data will be returned each and every time. You can search through the API logs to determine what predictions were made and what features were used without a feature store, but that gets to be pretty ugly code. With a feature store, you can just take the evaluation of a model, and immediately see exactly what features were used in it. Feature stores provide some UI so that you can maintain your features, and govern them. You can monitor your features also, which is pretty cool. You can monitor the distributions of the data in your features, both that were used in training, as well as the features that you're seeing live through feature serving.

How Does This Fit Into The ML Stack?

That's basically what a feature store does. Let's go into how it achieves that. How does it work? Feature stores can fit into the machine learning stack in a couple of ways. First, the Splice Machine feature store is part of an end-to-end machine learning system that has both a modeling system, an experimentation system based on MLflow. Of course, it has all of the capabilities of the feature store itself. It's also built on top of a hybrid operational and analytical data platform. It's also a modular feature store. What does that mean? That means that if you're doing modeling in another environment, experimentation in another environment, let's say in Databricks, using Databricks notebooks, and MLflow on Databricks, the feature store works directly with Databricks. Also works with Dataiku, or Domino Data Labs, or whatever machine learning environment you're using. The Splice Machine feature store can help you perform those four functions of a feature store. Data comes from anywhere. You can bring data in from Delta Lake. You can bring data in from Cloud Storage maybe in parquet files. You can bring data in from federated calls to data warehouses like Snowflake, and have those sources of data automatically update the features in the feature store. The feature store executes its operations through this data platform.

Problem: Disconnected Compute Engine

The problem with traditional feature stores or bespoke implementations of feature stores, both the commercial ones that are available as well as the bespoke ones that have been built in the marketplace, is that they typically use disconnected Compute Engines, commonly called a Lambda architecture in the past. Basically, the idea is this, that you use one Compute Engine for fast operations, for low latency serving features, and for streaming events and ingesting streaming events. You use an analytics engine to do the training set creation, the aggregation pipelines, and batch ingestion of big batch jobs from, let's say, data warehouses or applications. The feature store's job is try to keep these consistent. This is really hard. In fact, it's inevitable that they become inconsistent. This is what I was referring to before about the problem with feature stores or the problem with trying to keep your features consistent for training and for serving.

Solution: Hybrid Operational/Analytical RDBMS

How do we solve this problem? The Splice Machine feature store is built on the Splice Machine RDBMS. The Splice Machine RDBMS is a hybrid transactional and analytical platform, we call it HTAP. What that means is that it can perform both kinds of workloads really well. If you're looking up a single feature vector, and it's a single record lookup of a bunch of different features, or a single record from multiple tables that are joined on a join key, the cost based optimizer of the Splice Machine database knows that this is a transactional type of workload. It will compile those instructions and dispatch them out to a key-value store. We use Apache HBase under the covers of the feature store and database for the low latency operations.

However, if you're doing a recency, frequency, and monetary value pipeline, there's a ton of GROUP BYs on a table scan and a bunch of aggregations over that, that's going to end up being an analytical workload. It's going to scan many records. The cost based optimizer can interrogate this query and fan that out. It will take the compiled instructions and execute that on Apache Spark. What we've done under the covers, is seamlessly integrate Apache HBase and Apache Spark with a full ACID compliant transaction manager in a scale-out fashion, in order to support arbitrary scale of data with both kinds of workloads. In fact, it even supports full anti-SQL with secondary indexes and triggers. This is critical for the performance of the feature store.

Feature Set - Two Tables in a Single Database

How does the feature store use this architecture? It works in the following way. We organize features into feature sets. Feature sets have two tables for each feature set. We have one for the current values of the feature, and another for all of the older values of the features organized in a time series fashion. Each of these tables have a primary key, so that you can look up an entity very quickly. Maybe that's a customer ID or a transaction ID, depending on what you're modeling on your features. The system maintains consistency using the ACID compliant capabilities of the relational database. Here's how it works. From any data source, whether that's a segment event library, a federated query out to Oracle or Snowflake, or perhaps an access to a parquet table in Cloud Storage, the feature transformations that come from these sources can be managed in an event-based streaming, or batch basis using Apache Airflow. This data is changing, and it changes the current values of the features.

As these features are changing, what happens is triggers fire. A trigger is a small piece of SQL code that can transform the records that are changing and actually execute other SQL. What we do with the trigger is take the old values of the features when you update some of the features and put them into the feature history table that is indexed by the primary key and also a time series timestamp. Now you can keep track of every old value of the feature set, and this is what we use for training. We serve features from this table. This table is used for feature serving, low latency, few milliseconds. This table is used to construct the training sets. That's basically the idea, all kept consistent with an ACID compliant trigger that's native to the relational database system.

MLOps

Feature stores are one piece of the equation. When you combine feature stores with a modern approach to MLOps, you get that governance that we've talked about before. What do I mean by that? Let's start with model deployment. When you're ready to prove your model and push it into production, the typical approach is to wrap that model in an API, potentially containerize it, and push it out in a DevOps fashion to some autoscaled system that can fill that endpoint. Typically, that might be a cloud system like SageMaker, or perhaps Azure ML. Perhaps it's just a Kubernetes cluster. We do that, but we also provide a new approach to model deployment, something that's native to the database, and quite unique. We call this database model deployment. All of these techniques are available to the user of the Splice Machine feature store.

Database Deployment

I'd like to just give a little bit of a double click on what is database deployment. What does that mean? When you say deploy model inside of your Jupyter Notebook, what happens is the Splice Machine system interrogates the model, and serializes it and puts it into the database, and makes it available in real time through triggers. All you need to do to actually execute the model, to generate a prediction is put records into the prediction model table, and have the columns for the features be populated. Through database triggers, the system automatically runs the model on these features, and places the result of the model back into that same record. What's happening here is that the new records coming in to the prediction table automatically trigger predictions. This is all done with very little code, a single click or one function in a notebook.

This provides two things. First, it's really easy. It's easy for the data scientists to push their models out and not have to negotiate RESTful APIs or do endpoint programming, or work with other machine learning engineers to get it out there. They can push it into the database, and now consumers of this model can simply access the model by putting records in a database, which every developer does all the time. The second thing that's really powerful about database deployment is that it essentially becomes an evaluation store. What that means is, I have a full history that memorialized every prediction that was made, and every feature that was used, and what model was used to make that prediction. This is what gives you that end-to-end governance, and that transparency of lineage.

Benefits

What does this all mean? The benefit of a feature store and database deployment combined, is that you can literally have your team be 100 times faster at every stage of the data science process. In the first stages of data prep, these automated aggregation pipelines truly enable you to shorten the cycle it takes to build flowing aggregation like RFM. In the feature engineering process, now that you can leverage features that were developed before, and that you can build your models up using work that's already been proven, greatly reduces the amount of overhead that the data scientist has in feature engineering. Then in experimentation, the construction of training sets has been so simplified that it's just single lines of code in order to construct these training sets. As opposed to tirelessly trying to keep your training sets, point-in-time correct, and be able to make sure that the training set creation is the same code as what's used in deployment. Then, of course, deployment is so much easier when you just simply enter records into the database, and the models run for you.

Coming in 2021

We're so excited about this feature store idea that we're writing a book on it, coming in 2021. It'll be a comprehensive text with lots of exercises. We look forward for you to get your hands on that, to get really an in-depth view of what a feature store can do.

Summary and Next Steps

We improve the productivity of the entire data science process, because now feature engineering is streamlined and automated, as well as deployment. Now, the models that are resulting from a data science team that uses a feature store are much more predictive. Because the training and serving pipelines are guaranteed to be consistent, and are guaranteed to be repeatable. They're consistent with each other, and they're consistent in time. From a governance perspective, you'll never have to worry about that regulator or that audit, or just that ability to understand what your models are really doing, because you'll have full transparency. You can certainly come to our website and get a demo of the feature store, or even try it yourself. If you're interested in working on projects like this, you can certainly sign up.

Questions and Answers

Jördening: You said that you would just need to add the features to the table to basically get the prediction, how does that affect the latency? Because, in the end, you probably fire an event, it has to go to the Compute Engine, and then you need to probably wait for a promise to get the reply.

Zweben: It's actually quite fast. The relational database management system has an OLTP component to it, meaning it demonstrates transactional latencies, a few milliseconds for a single record lookup. The latency of being able to get a prediction is roughly at that level with very little overhead. The model is cached in memory on each of the database servers. As you enter a record into that table, the trigger fires automatically. It marshals the columns of that record, which are basically features, runs them through the model, and then inserts the record back into the database. Especially when you batch these together, and you may have like a lot of things happening at once, we've seen just a couple of milliseconds to be able to pull this off for averaging across a large set of predictions at once. It's quite fast.

Jördening: He's curious about the technical approach in centralizing every model inside a database because microservices say decentralize, and we're basically going back, and you said as well that you would cache every model in memory on each database server.

Zweben: The microservices design pattern is an excellent design pattern, but taken to the extreme, in general, forget machine learning, I think is flawed. It doesn't make sense if you take the microservices model strictly in an orthodox fashion. Every microservice has its own little database. The problem with that, especially for machine learning applications is that pattern breaks down. Why? Because machine learning models like to blend data from multiple let's call them objects together and multiple entities together, in order to find signal in the patterns that are being displayed. Centralizing your data is clearly a best practice. The microservices model needs to be taken with a grain of salt.

Now taking it to the database deployment perspective, there are benefits of centralizing this. You can still create a microservices API to these deployed model tables, where each model or each domain area is only accessing certain tables. You can even wrap API around that. The beauty of centralizing in the database is latency. It is backup and recovery. It is the memorialization from a governance perspective, where you have a central place to get that lineage. There are just so many advantages to using the tried and true database technology that is out there that this pattern is better than a strict view of microservices. It's an opinion. Others have other religions, but that's our perspective.

Jördening: I think on the last or the pre-last QCon we had a talk, "To Microservices and Back," where they basically had exactly this discussion that you increase latency if you go to microservices, in exchange for gaining separation. You always have this tradeoff between A and B. How much memory do you have on one of those database servers? Because data scientists tend to get bigger in their models. Especially like all the ERP systems are now deployed on 14 terabyte memory servers. Even they reach the limit at one point.

Zweben: There is this interesting thing that's happening in the marketplace, between what I'll call phase one or level one data science and level two. Level one data science is where a data scientist is happy working on their own machine, and they bring their model and their data all into Pandas maybe in memory on one machine. You can get a lot done with that. As you start to mature, you just can't fit your datasets into one machine, in one environment. The relational database model that we have underlying our feature store is based on Spark, and we're strong believers in Spark. Machine learning becomes a distributed approach, and everything you could do to a Pandas DataFrame you could more or less do with Spark DataFrames, but do so in a distributed fashion where the executors are in parallel operating your transformations and actions. The database operates the same way in Splice Machine with Spark underneath it. You don't have to interact directly with Spark, if you don't want to, you can just issue SQL, but you can.

The bottom line on this is that we don't require each server to have terabytes of memory. In fact, we take advantage of a distributed architecture so that you can stay relatively commoditized in your hardware infrastructure. On the other hand, we configure any workspace declaratively. You can really create any cluster you want. If there's a use case that will benefit from terabyte servers in a distributed cluster, or more, you can do so. The beauty of the underlying feature store architecture and ML system is that you get to configure your distribution in each instance any way you want. If somebody wants to do that crazy, big fat servers, they can, but they don't have to.

Jördening: Basically, when you enter the features into your table, and you get the response or the prediction back, is it then basically automatically selecting the server that has the model in memory?

Zweben: Let me take a step back on the database, and then I'll talk about the model. The database itself is a distributed database. You can throw a load balancer when you issue SQL to it, the queries are distributed in a random fashion to any one of the servers. Each server can process SQL. Each one optimizes the query and then dispatches the work to every server in a shared nothing way. As you start deploying models, the cache model actually ends up on every server out there. Then whichever one you connect up to and ask for the model, it'll be executed on that particular node. Again, even model deployment and model inference, we're trying to distribute across the multiple nodes.

Jördening: I was just wondering, because if you have them in memory, how you basically clear the cache and keep them in cache, to decrease the latency. Because that's something we saw in our side, like every time you jump to the next service or to the next node, you lose time, because HTTP needs time.

Zweben: You can put those hops across the network, you have to be careful about. Our feature set API automatically does the caching so you don't have to manage it. Frankly, I think it's a very good question about, what would the latency be if you had thousands of models in production, and your system had to swap the models in and out of that cache? What would be the latency then as it deserializes the model? I frankly don't have numbers on that. I think that's a good experiment to go check.

Jördening: By centralizing you have both types of data, the online predictions and the batch written in the same database. It's unified. You don't have it split.

Zweben: Yes, they are. I didn't use this terminology, but most feature stores have an online store and an offline store. We literally differentiate the architecture that we came up with, with a contrarian view. We have one feature store that performs both functions, and it knows which of the Compute Engines to use for the different processes at the database level, and thereby can separate those kinds of computation.

Won't the analytical processing on the database, if you're executing a pipeline, clobber the performance of the serving? The beauty of it is that they're on separate JVMs. You have one JVM that's operating the OLTP store, the key-value store, it's HBase. Another one that's operating Spark executors. They communicate under the covers in a very optimized way to create DataFrames from HBase, but they compute on separate memory spaces. They're fairly isolated. Where they do have resource overlap at the storage level of everyone depending on the disks, to access the data at the same time, there will be contention for the RPCs to access the storage, but it's measurable. You can architect for that overhead, and it's fairly constant overhead for that.

See more presentations with transcripts

Recorded at:

May 13, 2022

Monte Zweben

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?