BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News LinkedIn Open-Sourced Its Feature Store to Evangelize Productive Machine Learning

LinkedIn Open-Sourced Its Feature Store to Evangelize Productive Machine Learning

Bookmarks

LinkedIn Engineering recently open-sourced its feature store Feathr, which helps engineers to develop machine Learning products by simplifying feature management and usage in production.

Feathr is the data management layer for machine learning applications. It defines features, computes them for training and inference purposes, and makes them discoverable by other machine learning developers. It helps in scaling and managing the machine learning products by reducing the common feature generation, maintenance, and observability steps.

As shown in the following picture, machine learning feature generation pipelines need to bring different time-sensitive data sources and join them. These features are persisted in databases or caches for training and inference purposes (real-time or batch). In this process, consistency is very important. It means that features should be prepared in the same way for training and inferencing to avoid inconsistency and leakage in the machine learning models.

General Machine Learning Feature Generation and Inferencing PipelinesGeneral Machine Learning Feature Generation and Inferencing Pipelines

Feathr is an abstraction layer that provides the namespace for defining, computing, serving, and discovering common machine learning features. The high-level architecture is like the producer-consumer architecture where the producers define, generate, and register machine learning features, and consumers use those features in training and inferencing. Feathr has a simple programming model. Developers just provide the names of the features that they want to import and use in their machine learning models. All the other background processes like how everything should be sourced and computed happens in Feathr. As it is mentioned in the LinkedIn blog post :

Under the hood, Feathr figures out how to provide the requested feature data in the required way for model training and production inferencing. For model training, features are computed and joined to input labels in a point-in-time correct way, and for model inferencing, features are pre-materialized and deployed to online data stores for low-latency online serving. Features defined by different teams and projects can easily be used together, enabling collaboration and reuse.

As part of this announcement, LinkedIn engineering open-sourced Feathr in GitHub and made this service available on Azure (Microsoft Cloud Service) for developers.

Feature store is one of the most important services which is essential in machine learning operations (MLOps). It expedites usage and democratizes machine learning-enabled products in any enterprise. There is a special community around this topic which also has its summit.

AWS SageMaker (Amazon Machine learning Service) feature store and Google Cloud Vertex AI are some samples for feature store solutions on public clouds. Also, there are other open-source feature stores for the public like Feast, Databricks Feature Store, and Hopsworks.

About the Author

Rate this Article

Adoption
Style

BT