BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Uber Migrates Eats Catalog to Apache Pinot for Near-Real-Time Olap at 10B+ Row Scale

Uber Migrates Eats Catalog to Apache Pinot for Near-Real-Time Olap at 10B+ Row Scale

Listen to this article -  0:00

On December 9, 2025, Uber Engineering announced a major upgrade to its Eats Catalog architecture, significantly improving the experience for merchants and support staff. By implementing Apache Pinot as a real-time indexing layer, Uber successfully reduced data freshness latency from several hours to just 5-10 minutes, while bringing query response times for its 10-billion-record catalog down to a 1-3 second range. This new OLAP system replaces a legacy batch-oriented infrastructure based on Apache Hive. The new architecture is documented in the Uber Engineering blog.

Internally, the catalog system is known as INCA (INventory and CAtalog). Before the change, the primary pain point for INCA users was the time taken for updates to be available in the catalog. In the legacy system, merchants had to wait hours for menu or price updates to reflect in search results because Hive is inherently batch-oriented. Moving to Pinot, Uber was able to provide their merchants with interactive management tools where they could see their changes live, within a few minutes.

Instead of using Pinot as a primary database, they treat it as a Lambda Architecture speed layer. An in-house database, called Docstore, remains the source of truth for ACID transactions. A new component called the Inca-Indexer service acts as the bridge to Pinot, flattening complex nested data from Docstore into a format Pinot can digest. Apache Kafka serves as the transport layer, ensuring that the indexing process does not put a direct load on the primary production database.

The most critical Pinot feature for this project was full upserts. In a catalog of 10 billion items, prices and availability change constantly. A traditional append-only OLAP store would require a full table scan to find the "latest" version of an item. To enable Pinot upserts, Uber maintains a primary key for every item. When a price update arrives, Pinot identifies the previous record and marks it as "stale" in an in-memory bitmap, ensuring queries only see the most recent state.

A unique architectural hurdle Uber faced was the Small Segment Problem. Due to catalog updates being frequent and distributed across many Kafka partitions, Pinot originally created thousands of tiny data segments. This led to "segment exhaustion," where the query overhead of managing thousands of files degraded performance.

To solve this, Uber engineered the Small Segment Merger (SSM). This is a background "minion" task that continuously merges these tiny real-time segments into larger, optimized blocks. This architectural tweak alone reduced their storage footprint by 40% and improved query response times by 75%.

The Uber team made several contributions to the Apache Pinot project: the Small Segment Merger (SSM), UUID hash binning, and migration to JRE 17.  The Apache Pinot project maintains a highly active and diverse ecosystem, with significant contributions from major technology companies including LinkedIn, Uber, Stripe, and Walmart. Its health is evidenced by a consistent commit velocity and a robust community that frequently releases updates focused on indexing techniques, query optimization, and integration with the broader streaming data stack.
 

About the Author

Rate this Article

Adoption
Style

BT