Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Scaling Video Quality Measurements at Netflix with Cosmos

Scaling Video Quality Measurements at Netflix with Cosmos

This item in japanese

Netflix relies heavily on measuring perceptual video quality for different business purposes. As metrics evolve and become part of more workflows, their measurement tool needs to scale too. Netflix recently described how a new video quality measurement workflow was implemented using Cosmos microservices to foster innovation in quality metrics, with good scalability and loose data coupling.

Perceptual video quality measurements were initially generated at Netflix by their production system, called Reloaded. Reloaded has many responsibilities and shows two significant disadvantages.

On the one hand, it is a monolithic system, which prevents rapid evolution, for example, when metric algorithms are updated. On the other hand, it produces video quality metrics only during video encoding, making it unfeasible to update metrics due to the high cost of re-encoding the videos with each update.

Netflix teams developed a new solution to address these and other concerns with their Reloaded architecture: Cosmos.

Cosmos is a computing platform for workflow-driven, media-centric microservices. Cosmos offers several benefits (...) such as separation of concerns, independent deployments, observability, rapid prototyping and productisation.

A new perceptual video quality measurement service was developed using Cosmos' architecture. The new independent service is called the Video Quality Service (VQS).

VQS ​​takes as input two videos: a source and its derivative and returns back the measured perceptual quality of the derivative. (...) There is an external-facing API layer (Optimus), a rule-based video quality workflow layer (Plato), and a serverless compute layer (Stratum).

Optimus, Plato and Stratum are the different Cosmos' subsystem names. Layers communicate via a queuing system.

Netflix Cosmos architecture

Overview of the architecture of VQS on Cosmos. Source:

VQS's API exposes endpoints to request quality measurement and retrieve the results asynchronously.

Its workflow rules divide the video in chunks to parallelise quality metrics calculations when measuring video quality. This parallelisation leverages Netflix's scale to increase throughput and reduce latency for each measurement.

VQS then assembles individual chunk quality metrics to produce the final video quality metrics.

Developing VQS was one of many steps for migrating the Reloaded system into Cosmos. Such migration is done incrementally, so some workflows that depend on video quality metrics are still Reloaded-based.

This cross-dependency led to the introduction of some workflows in Reloaded to bridge the two systems by routing video quality traffic from Reloaded into Cosmos.

The redesign leading to VQS had another goal in addition to decoupling video quality measuring from video encoding: decentralising quality score storage. For example, storage of quality scores was moved away from the centralised Reloaded storage into the Netflix Media Database (NMDB) with the added benefit of allowing score querying. Naturally, the data format used by VQS in Cosmos is different from Reloaded's.

Since their data models are different in structure and storage, another service was introduced, called the Document Conversion Service (DCS). DCS converts between Cosmos and Reloaded data models and interfaces with NMDB and Reloaded storage.

The analyst Tommy Flanagan wrote that Netflix's commitment to Cosmos "is an example of how fusing infrastructure and media algorithm developer teams together can realize a vision that would not be possible in your typical top-down engineering environment".

Netflix currently uses the VMAF metric to measure its streaming video quality and continues working on its video quality feature algorithms.

About the Author

Rate this Article