Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Apache Pinot 1.0 Provides a Realtime Distributed OLAP Datastore

Apache Pinot 1.0 Provides a Realtime Distributed OLAP Datastore

Apache Pinot is an open source column-oriented distributed data store written in Java. Pinot is designed to use Online Analytical processing (OLAP) in order to answer multi-dimensional analytical (MDA) queries with low latency.

Pinot started as an internal project at LinkedIn in 2013 to power analytic solutions before being open-sourced in June 2015 based on the Apache 2.0 license. The project became part of the Apache Software Foundation in June 2019.

Three hundred issues have been closed by the community in the year before the 1.0 release. Issues introduced new features, improved performance and solved bugs. Currently the project has more than 1.3 million lines of code on GitHub, contributed by over three hundred contributors.

Apache Pinot is best suited for analytics on immutable real-time ingested data, especially when querying time series data with multiple dimensions and metrics. The project uses Apache Helix as an embedded agent for cluster management and Apache Zookeeper for coordination and maintenance of the cluster's state and health.

Pinot provides fast queries, capable of filtering and aggregating petabytes of data with P90 latencies in the tens of milliseconds. Data may be ingested in real time with streaming solutions such as Apache Kafka, Apache Pulsar and AWS Kinesis and in batch with Apache Hadoop, Apache Spark and AWS S3. Pinot is horizontally scalable and fault tolerant. Querying the data is possible with the Pinot Query Language (PQL), SQL or the Trino and Presto SQL query engines. PQL supports similar functionalities like SQL: selection, aggregation, grouping, ordering and filtering.

One of the key features of this release is the functional completeness of the multi-stage query engine. The default query execution engine was never optimized for complex queries such as distributed joins and window operations. The multi-stage query engine supports multi-stage operators such as distributed joins and windowing in real time with a new query plan optimizer which minimizes data shuffling. Apache Pinot's documentation explains how to enable the multi-stage query engine.

The Getting Started guide describes running Pinot locally, in Docker, in Kubernetes or on the Azure, GCP or AWS public clouds. The following command may be used to run Pinot with a pre-loaded baseball dataset:

docker run \
	-p 9000:9000 \
	apachepinot/pinot:0.12.0 QuickStart \
	-type batch

The Quick Start Examples documentation provides more information on the different examples and all the available start commands.

Further details can be found in the release notes and the announcement of Apache Pinot 1.0. Tim Berglund, vice president of developer relations at StarTree, introduced Apache Pinot 1.0 on YouTube and explained Apache Pinot in general. Sessions are regularly organized on the Apache Pinot Meetup Group and questions may be asked on Slack.

About the Author

Rate this Article