BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Managing Global CDN Operations at Netflix

Managing Global CDN Operations at Netflix

Bookmarks

At the recent Strange Loop conference, Robert Fernandes, engineering manager at Netflix, who leads the Open Connect Tools team, gave a talk on how they manage operations for Netflix’s in-house OpenConnect content delivery network (CDN).

The talk is a summary of Netflix's move to their in-house CDN -- Open Connect -- and the challenges it brought for their operations teams. The Open Connect team is the umbrella term for teams that handle content delivery to subscribers, including technical operations, inventory management and partner management. The need to automate these activities was strongly felt and various monolithic applications were built by different teams. This brought on new challenges for deployment. Fernandes talks about how they plan to mitigate these challenges.

Most Netflix services run on AWS. Netflix started streaming services in 2007, and by 2009 had built an internal control center called Netflix Content Control Plane (NCCP). NCCP's job was to steer the end user to the right edge (CDN location), whereas the actual content delivery was done by third party providers like Akamai, Level 3 Communications, and Limelight Networks. The team moved to an internal CDN in 2011, calling it Open Connect, and with it came the entire job of managing the infrastructure.

When a Netflix application client (mobile, desktop etc) requests a video, it typically receives three domain names from which the client requests the content via HTTPS. Open Connect serves the content and any static resources used by the application, like Javascript. The CDN is built with customized servers called Open Connect Appliances (OCA) which are cache servers. They run on a custom fork of FreeBSD with NGINX.

Two kinds of cache servers are configured. A storage appliance stores catalogs in bulk, on HDDs, which are relatively slower. The flash-based or "offload" server has solid state drives and comparatively higher throughput, and is mostly used for serving popular content. Netflix content servers -- over 10k appliances -- are deployed across 1000s of sites globally. Some sites are handled by ISPs with servers provided by Netflix, whereas in others Netflix directly controls and manages the hardware. This architecture is complemented by Netflix's backbone network.

The Open Connect teams are split into development, operations, network management and non-technical functions like partner management and shipping. The teams started by solving problems within their own verticals, which led to monolithic applications. The Open Connect Tools team was formed to mitigate this. This team takes care of alerting, monitoring, config management, deployment automation, inventory management, logging and metrics, and partner self service. The team working on the control plane focuses more on the content placement, geographical aspects, routing and security of the CDN. The talk did not delve much into technical details of the applications or of the automation. The future plan is to have a more "layered approach" with microservices, and have common solutions that can be shared across teams.

Rate this Article

Adoption
Style

BT