Netflix Serves 84% of Query Results from Cache with Interval-Aware Caching in Apache Druid

Netflix has improved Apache Druid query efficiency by introducing an interval-aware caching strategy that serves approximately 84 percent of analytics results from cache and reduces query load by about 33 percent. The optimization targets rolling window dashboards, where continuously refreshing queries with slightly shifting time ranges traditionally lead to redundant computation and repeated scans over large datasets.

At Netflix scale, real-time analytics systems process trillions of rows to power dashboards used for monitoring, experimentation, and operational decision making. These dashboards frequently execute near-identical queries, such as error rates or engagement metrics, over sliding time windows. While the intent of each query remains the same, minor shifts in time boundaries cause traditional caching systems to treat them as distinct requests, resulting in low cache reuse and repeated computation in Apache Druid.

Evan King, Co-founder of Hello Interview, described the challenge in a post

Repeated queries, such as errors in the last 3 hours, are treated as distinct requests by traditional caches, even though most of the underlying data remains unchanged.

Netflix’s approach decomposes query results into time-aligned segments to enable reuse across overlapping rolling window queries. Instead of caching full query outputs, the system stores intermediate aggregates for fixed time intervals. When a new query arrives, cached segments are reused for stable historical portions of the time window, while only the most recent interval is recomputed from Druid and merged with cached results.

Query Structure & Cache-Key Separation(Source: Netflix Blog Post)

A key motivation and outcome of the approach was highlighted by Ben Sykes, a Netflix engineer, who noted:

A 33% drop in queries to Druid and a 66% improvement in P90 query times

At Netflix scale, with over 10 trillion rows in Apache Druid, repeated rolling window queries became a major bottleneck. The caching layer addresses this by using granularity-aligned buckets with exponential TTL policies, enabling long-lived caching of historical intervals while maintaining freshness for recent data. This balances data accuracy with performance efficiency. Architecturally, the caching layer operates as an external proxy that intercepts incoming queries, separates query structure from time intervals, and generates reusable cache keys. Cached segments are stored in a distributed key-value system, allowing independent expiration and efficient retrieval.

Architecture of Query Flow (Source: Netflix Blog Post)

With this design, only the most recent interval requires recomputation, while historical segments are reused across multiple overlapping queries. As a result, queries reaching Druid operate on significantly reduced time ranges, scanning fewer segments and processing less data. In some workloads, Netflix observed up to a 14x reduction in result bytes and substantial reductions in segment scans.

The system is currently deployed as an experimental layer and continues to evolve. Future work includes extending support to templated SQL queries used by dashboarding tools, reducing the need for native Druid query expressions. Netflix is also exploring tighter integration of interval-aware caching directly into Apache Druid to eliminate the need for an external proxy layer and improve query planning efficiency.

About the Author

Leela Kumili

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Leela Kumili

Rate this Article

This content is in the Time Series Data topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter