BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Airbnb Open-Sources its ML Feature Platform Chronon

Airbnb Open-Sources its ML Feature Platform Chronon

This item in japanese

Chronon, Airbnb's platform that creates the infrastructure required to transform raw data into ML-ready features, is now open source. As Airbnb ML infrastructure engineer Varant Zanoyan explains, Chronon supports a variety of data sources and aims to provide low-latency streaming.

We built Chronon to relieve a common pain point for ML practitioners: they were spending the majority of their time managing the data that powers their models rather than on modeling itself.

Feature engineering lends itself more naturally to two slightly different approaches for training and inference, explains Zanoyan. For training, it is usually easier to start from data available in the data warehouse, while data extracted from logs is more readily available for inference. In both cases, some amount of work is required to encompass the other task, i.e., converting log data to the data warehouse format in the first case or accumulating data from the logs before using it for modeling.

To simplify the task of ML practitioners, says Zanoyan:

Chronon requires ML practitioners to define their features only once, powering both offline flows for model training as well as online flows for model inference.

Chronon accomplishes this by distinguishing between batch features and streaming features. Batch features are calculated daily; streaming features are used for real-time updates as well as included in a batch job. A detailed explanation of how this works in detail goes beyond what can be covered here. Suffice it to say that Chronon provides an API to create and fetch both kinds of features with low latency. The API can be used through clients in Java, Scala, and Python.

Zanoyan says work on Chronon does not end with its open sourcing. The goals are to lower the cost of iteration and computation even further, make it easier for practitioners to define features using NLP, and add some intelligence to help practitioners build better models.

If you want to start using Chronon, head to the quickstart guide, which provides an example implementation of a model and explains how to use the two main components of its API GroupBy and Join.

Under the hood, Chronon builds pipelines using Kafka, Spark/Spark Streaming, Hive, and Airflow. In addition to GroupBy and Join, Chronon supports StagingQuery and provides several aggregations, including windows, buckets, and other time-based aggregations. Additionally, it provides support for advanced feature computation, such as feature derivations, feature chaining, and external and contextual features.

About the Author

Rate this Article

Adoption
Style

BT