At QCon New York 2017, Yunong Xiao presented "The Paved PaaS to Microservices at Netflix" which discussed how the Netflix Platform as a Service (PaaS) assists with maintaining the balance between the culture of freedom and responsibility and the overall organisational goals of velocity and reliability. The Netflix PaaS team attempts to provide a sensibly configured but customisable "paved road" platform for developers by offering standardised and compatible components, pre-assembling the platform, and by providing extensive automation and tooling.
Xiao, Principal Software Engineer at Netflix, began the talk by referencing the Wikipedia definition of PaaS " ...allows customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure and platform". Within the Netflix technical stack, functionality provided by the PaaS includes microservice Remote Procedure Calling (RPC), service discovery and registration, operating system, application runtime, configuration, metrics, logging, tracing, dashboards, alerts and stream processing.
At Netflix the backend services are typically deployed onto a Java/JVM runtime which is fronted by an Edge API, but client teams own standalone services that they create in order to meet the needs of their associated end-user delivery technologies like smart TVs, iOS and MS Windows. These services are typically developed using JavaScript and Node.js, and the delivery teams are not necessarily familiar with backend operations and platforms. Netflix famously embraces a culture of freedom and responsibility ("F&R"), and this must be balanced with the overall organisational goals of velocity and reliability. Functionality provided by a PaaS can help with this balance, and this is implemented in three main ways in order to provide a homogenised but configurable "paved road" for developers, including the provision of standardised components, pre-assembled platform, and automation and tooling.
Standardisation can provide consistency, leverage for rapid construction, interoperability, quality guarantees, and easier support. In regards to any failure within the Netflix platform, the Mean Time To Detection (MTTD) and Mean Time To Repair (MTTR) are vital for any consumer-facing services, and standardisation of microservice RPC communication to a single mechanism, such as gRPC, makes instrumenting calls and debugging failure much easier. However, the freedom and responsibility culture at Netflix means that developers must also be free to innovate, experiment and integrate new approaches into the entire technology stack, and so the platform provides a pluggable interface.
The Netflix PaaS attempts to provide all of the required core features pre-assembled "out of the box", which removes the need for copy/paste, and prevents version incompatibilities and missing components. Application, system, and runtime metrics and logs are enabled by default, and provide standardised dashboards to allow any team to debug high-level service issues. The remainder of the bundled functionality is provided as "layers and flavours", and this allows the mix-and-match of required components such as data access, backend and rendering technologies. Semantically versioned implementations of the PaaS also take care of compatibility between libraries, and ensure that components initialise correctly. Platform correctness and upgrades are validated through extensive testing, and the PaaS team "eat their own dog food" by deploying their services onto any new version of the platform.
Applications are deployed to the PaaS using a typical workflow of development, testing (using Jenkins), deployment (via Spinnaker) and operations. The platform provides a CLI for common development tasks, such as environment bootstrap, integration with continuous delivery tooling, and running locally and in the cloud. Local development uses Docker, with JavaScript code being live reloaded within the container and debuggers attached if required. In addition to the testing support provided by the continuous delivery pipelines, the platform also provides first class mocks that allow a developer to test against functionality provided by the platform and simulate and stub required responses.
Pre-configured pipelines for deployment and rollback are also provided, and there is a single command deploy to any stack with automated canary analysis and autoscaling pre-configured. Customisable dashboards and alerting are automatically generated, and operational analytics and tooling such as CPU profiling and core dump analysis are also integrated into the platform tooling.
Additional information on Yunong Xiao's presentation "The Paved PaaS to Microservices at Netflix" can be found on the QCon New York website, and the video recording will be released over the coming months.