Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Netflix Attempts to Reconcile Large Scale APIs with Developer Autonomy

Netflix Attempts to Reconcile Large Scale APIs with Developer Autonomy

Leia em Português

This item in japanese

Katharina Probst and Justin Becker, engineering managers at Netflix, recently wrote an article on maintaining developer autonomy in API environments for Netflix's tech blog. The August 23 blog post "Engineering Trade-Offs and the Netflix API Re-Architecture", explores the difficulty of reconciling developer code and process ownership with multiple team-wide shared services in API environments.

The rise of microservices and an increasing emphasis in the software engineering community on entirely self-contained and self-maintained software stacks (for example, the popularity of container-based development with tools like Docker) can be at odds with consumers who need to access many distinct service's data without adding considerable additional complexity to their application. Microservices also have a complicated relationship with industry standard best practices around code re-use and collaboration, since this creates an internal dependency within the microservice on external software.

In their blog post, Probst and Becker write, "...we work to reconcile seemingly conflicting engineering principles: velocity and full ownership vs. maximum code reuse and consolidation." Since APIs directly imply communication between multiple services, maintaining a singular team's ownership of their data usage within it can be tricky. If every microservice has its own API that directly communicates with consumers, the microservice itself must take on the burden of all its consumers' various requests, detracting from the whole notion of it being a fully independent and maximally productive service. However, if there's a singular API that serves as a buffer layer for all microservices this means that individual services have far less control over how users are actually consuming their data and the API becomes a catch-all for every consumer request possible.

Probst spoke at QCon New York 2016 on how Netflix is planning on potentially changing their API to better suit the needs of many autonomous applications. At Netflix they have one API that acts as an orchestration service between different microservices and their individual APIs. While this API takes the burden of consumer requests for over a 1000+ different devices off of each individual microservice, it also introduces a single point of failure. If the API goes down every consumer service is affected, rather than a small group of associated users. Probst plans to mitigate the risk of service contamination in future versions of the API via containers. In her QCon talk, she said, "In the future when a script has a problem for a large class of problems...when one device or the scripts for one device are unavailable it doesn't affect the other devices and it doesn't affect the API." By keeping a singular orchestration API but mitigating risk through process isolation via containers, Probst is able to maintain a single API that all consumer-facing microservices communicate with and thus have the perfect platform for shared tooling and services, a notorious pain point for many microservices.

While some key API decisions, like utilizing containers to isolate scripts, have been made definitively by Probst, it's clear that others remain without an optimal solution. For example, one of the main topics in the blog post is whether or not to have multiple orchestration APIs that give the underlying services greater control over orchestration or have the existing API contain less logic and serve more strictly as an interface for the data with most of the logic around massaging and adding to the data layer before serving it to consumers in its own service group specific logic layer. With the first approach, it becomes hard to sync all of the different orchestration APIs together, which creates barriers for shared software across multiple service groupings. With the second approach it's hard to justify added latency for no real added functionality, just a greater distinction and finer granularity of control between services. The blog post ends without a clear final decision, but alludes to whatever it will be in the future being a compromise between different tradeoffs. As the number of more isolated self-contained services continue to grow alongside the need for common tooling, libraries, and consumer connectivity there may be no perfect solution.

Rate this Article