Melanie Cebula, infrastructure engineer at Airbnb, gave a talk at QCon London [slides PDF] about the internal tooling and strategies Airbnb adopted to support over 1000 engineers concurrently configuring and deploying over 250 critical services to Kubernetes (at a frequency of about 500 deploys per day on average). One key enabler was a layer of abstraction and generation of Kubernetes configuration from higher level primitives using standardized environments and namespaces (and automated validations whenever possible). Also critical was the automation of common workflows for engineers and using the same tools across all environments.
kube-gen is the internal tool at Airbnb that can take a service's parameters (defined in a single YAML file) and generate the full Kubernetes service configuration by adding all the necessary boilerplate configuration. In the past, Airbnb's use of file inheritance mechanisms for configuration (such as in Chef cookbooks) had led to cascading infrastructure failure effects. Therefore, one of the goals was to reduce the blast radius of potential mistakes by using YAML templates for service configuration.
The other major goal for kube-gen was to abstract away Kubernetes configuration and tooling complexities in order to allow engineering teams to retain ownership of their services deployment, with the necessary levels of isolation (partially guaranteed by auto-generated namespaces based on standardized environment names) but without a long learning curve. Although kube-gen is not available publicly as it addresses Airbnb specific context, Cebula pointed out some open source alternatives: helm (package management), kustomize (configuration via file inheritance), and kapitan (configuration via templating).
Figure: service configuration files in custom YAML get translated into Kubernetes required config files (one set per environment defined in the custom YAML) and then applied to the Kubernetes cluster (credit: Melanie Cebula, Airbnb)
Further strategies to promote homogeneous and easy to evolve service configurations included creating a new service skeleton repository in one command, validation at build and deploy time of configuration files (not only syntax but also known issues in values provided such as invalid project name or owner), and versioning the (generated) service configuration.
A newly created service's git repository includes both application and infrastructure boilerplate files (including for CI/CD), auto-filled in with sensible defaults and good practices (such as auto-scaled by default or documentation generation). Versioning service configurations (with a specific field in the YAML file) allows marking versions with issues (so they're never redeployed) - these could be issues in kube-gen itself or service specific - as well as distributing different versions on different channels (for example, stable vs beta).
Figure: examples of Airbnb service configuration YAML files, including a version field (credit: Melanie Cebula, Airbnb)
k is another internal tool at Airbnb. k is mostly an opinionated wrapper for kubectl which also filters out some kubectl's verbose output. k supports some extra functionalities as well, like wrapping the previously mentioned kube-gen tool, and building/pushing Docker images. The goal for this tool was to automate common workflows, therefore simplifying and standardizing engineering work by abstracting some of Kubernetes tooling complexity. But it also got developers and infrastructure engineers to talk a common language and use the same tools, which has strengthen collaboration, according to Cebula.
A typical workflow would be to start with k generate to generate Kubernetes files, then k build to build Docker images and push to registry, and finally k deploy to create Kubernetes namespaces and apply the Kubernetes files, waiting for a final deployment status. Services are built and deployed the same way regardless of the environment (namely a local machine, CI, staging, or production). It is also possible to run k diagnose which relies on a couple of plugins created by Airbnb: kubectl diagnose and kubectl podevents. The goal was to automate common manual steps when debugging a deployment issue: collect information on unready containers, find related pod events and get the logs for those containers.
Finally, Cebula mentioned some remaining challenges for Airbnb's Kubernetes journey adoption, in particular related to the migration of thousands more existing services requiring better multicluster support and scaling (some services use hundreds of replicas), handling more stateful services with high memory requirements, and moving all configuration to a GitOps workflow model with custom controllers.