Scaling and Automating Microservice Testing at Lyft

Lyft used cloud-based isolated environments for several purposes, including end-to-end testing. As the number of microservices increased, tests using these environments became harder to scale and lost value. Recent articles describe how Lyft shifted to testing using request isolation in a shared staging environment and used acceptance tests to gate production deployments.

Lyft built a Docker-based container orchestration environment that engineers could use for testing. It consisted of some tooling managing a local virtual machine and its configuration, including database seeding, package and image download, and installation. Initially meant for local use, this environment moved to the cloud and was called Onebox.

Engineers could use Onebox to run the service they wanted to test together with its dependencies and related data stores. A miniature version of the Lyft systems, in a nutshell. Onebox eventually reached a point where it could not scale anymore.

Lyft also has a shared staging environment, which is production-like in terms of (simulated) traffic and service levels. Because of its characteristics, teams were already deploying service feature branches there to get feedback based on realistic data. It was a good candidate for replacing Onebox in testing new features, but there was no service isolation: an unstable new feature could cause problems for the entire staging environment.

The solution was to implement "staging overrides" in the staging environment, which "fundamentally shifted [Lyft's] approach for the isolation model: instead of providing fully isolated environments, [they] isolated requests within a shared environment."

Instead of isolating entire services, the new approach isolates requests.

(Source: https://eng.lyft.com/scaling-productivity-on-microservices-at-lyft-part-3-extending-our-envoy-mesh-with-staging-fdaafafca82f)

Using this technique, engineers can deploy and start service instances that do not participate in Lyft's service mesh and therefore do not interrupt regular traffic. They are called "offloaded deployments". When engineers wish to test the new feature, they add specific headers to the request that cause it to be routed through the new instance.

Lyft built its service mesh using Envoy, ensuring that all traffic flows through Envoy sidecars. When a service is deployed, it is registered in the service mesh, becomes discoverable, and starts serving requests from the other services in the mesh. An offloaded deployment contains metadata that stops the control plane from making it discoverable.

Engineers create offloaded deployments directly from their pull requests by invoking a specialised GitHub bot. Using Lyft's proxy application, they can add protobuf-encoded metadata to requests as OpenTracing baggage. This metadata is propagated across all services throughout the request's lifetime regardless of the service implementation language, request protocol or queues in between. The Envoy's HTTP filter was modified to support staging overrides and route the request to the offloaded instance based on the request's override metadata.

Engineers also used Onebox environments to run integration tests via CI. As the number of microservices increased, so did the number of tests and their running time. Conversely, its efficacy diminished for the same reasons that led to Onebox's abandonment.

Lyft engineering examined existing end-to-end tests and discovered that only a subset represented critical business flows.

These critical tests were converted into acceptance tests. At the same time, Lyft moved away from a model where every service has its own set of integration tests into a small centralised collection of end-to-end acceptance tests. By centralising acceptance tests, test duplication was eliminated, test relevance could be better achieved and maintained, and code could be reused among tests.

A decision to stop running end-to-end tests as part of the "inner loop" CI was taken. Instead, acceptance tests are executed in the staging environment after each deployment.

Acceptance test results condition subsequent production deployments, effectively serving as a new production deployment gate.

(Source: https://eng.lyft.com/scaling-productivity-on-microservices-at-lyft-part-4-gating-deploys-with-automated-acceptance-4417e0ebc274)

Lyft extended an existing traffic simulation engine so engineers could use it for running acceptance tests too. Tests are described using a custom configuration syntax because this current engine was being extended. Otherwise, "existing testing frameworks, such as Cucumber/Gherkin, might have served [Lyft] better if [they] were starting from scratch".

After switching from integration tests running on each commit to pre-production acceptance tests, thousands of integration tests were removed or converted into unit tests, pull requests are ready to merge in minutes, and there has been no measurable increase in the number of bugs making it into production.

On the subject of acceptance testing, Ben Linders interviewed Dave Farley on InfoQ about the relevance and benefits of automated acceptance testing.

About the Author

Vasco Veloso

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Vasco Veloso

Rate this Article

This content is in the Developer Experience topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter