BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News From On-Demand to Live : Netflix Streaming to 100 Million Devices in Under 1 Minute

From On-Demand to Live : Netflix Streaming to 100 Million Devices in Under 1 Minute

Listen to this article -  0:00

Netflix has detailed the architecture behind its global live streaming platform, outlining how it built a cloud-based ingest pipeline, a custom live origin system, an expanded delivery layer through Open Connect, and real time recommendations that support large live events for millions of viewers.The three-part Behind the Streams blog series describes over three years of work focused on moving live streaming from an early-stage experiment to a reliable, low-latency system integrated with Netflix’s broader playback and personalization systems.

The ingest and transcoding pipeline forms the foundation of the system. Netflix engineers explain that the company used cloud services such as AWS MediaConnect and MediaLive to bring in live broadcast feeds and convert them into adaptive bitrate formats suitable for its diverse device ecosystem. The team notes that software based transcoding provided more configuration flexibility for formats, resolutions, and codecs during event production. This flexibility allowed rapid iteration during event production and the ability to handle a wide range of content sources across global partners.

Dedicated broadcast facilities to ingest live content from production (Source : Netflix Blog Post)

A second major element is the custom live packaging and origin layer. Instead of relying on a third party packager, Netflix built an internal system that prepares segments, manages encryption, generates manifests, and stores time critical assets in a dedicated Live Origin environment. As per Netflix engineers this system was designed to meet strict latency and durability requirements while supporting multiple audio tracks, captions, and both AVC and HEVC codec families. The team emphasized that the live origin must serve millions of read requests every few seconds while maintaining consistent performance during traffic spikes.

Delivery for live events extends the capabilities of Open Connect, Netflix’s global content delivery network. By publishing rapidly produced segments to geographically distributed edge locations, the platform reduces load on central origins and minimizes latency for viewers worldwide. Engineers stated this approach was necessary for synchronized viewing experiences such as sports and special events where even small differences in segment arrival times can affect the overall experience for viewers.

Live-origin Launch (Source: Netflix Tech Blog)

Real-time discovery introduces unique challenges as traditional recommendation systems rely on precomputation and caching, which fail for fast-moving live content. Netflix redesigned its infrastructure so live events appear quickly across the UI, even during demand spikes. To keep millions of devices in sync, it uses a two-phase system: prefetching recommendations ahead of time to spread load and broadcasting low-cardinality updates in real time to trigger instant cache updates. This approach ensures reliability, avoids thundering herd problems, and adapts dynamically to schedule changes. According to Netflix engineers, the system delivered updates to more than 100 million devices in under one minute during peak load.

A phased approach optimized for each constraint for real time recommendation( Source: Netflix Tech Blog)

Our real-time monitoring combines internal tools like Atlas, Mantis, and Lumen and open-source like Kafka and Druid, processing up to 38 million events per second while delivering critical metrics in seconds. Dedicated ‘Control Center’ facilities bring key metrics together for the operational team to monitor events in real time.

Netflix’s orchestration system automated ingest, encoding, packaging, and origin setup, minimizing human error and enabling rapid response. Redundant pipelines across regions ensure resilience, while Control Room offers real-time visibility, failover controls, and lifecycle management for live events. This combination of streamlines operations and optimizes resources, helped to guarantee a reliable, high-quality experience for millions of viewers worldwide.

About the Author

Rate this Article

Adoption
Style

BT