QCon NY: Matt Klein on Lyft Embracing Service Mesh Architecture

Matt Klein from the Lyft engineering team spoke at the QCon New York 2018 Conference about the Envoy service mesh architecture. Facing the operational difficulties with their initial microservice deployment, Lyft migrated to using a service mesh architecture.

Klein discussed the evolution of their architecture from five years ago which was based on Amazon Web Services (AWS) Elastic Load Balancer (ELB), a PHP/Apache monolith and MongoDB as the backend database. When they started using microservices three years ago they still had monolithic applications to maintain. Challenges with this model were multiple languages (PHP, Python, front end code in Node and Go services) and frameworks. They also had many protocols to connect to MongoDB, Redis, and caching servers. Lack of consistent observability in regards to metrics/stats, tracing, and logging was another challenge. Capabilities like service retry, circuit breaking, rate limiting and timeout were not completely implemented.

Klein said that typically all problems come down to networking and observability, and so it's very important to have observability built into the solutions. At the same time, application development teams shouldn't be developing these capablities from scratch in every project. Everytime the developers are not writing business logic or application code, they are wasting time.

Lyft's current application architecture is based on every service communicating with other services through Envoy. The idea behind a service mesh architecture is that the network is fully abstracted from the services. It is also abstracted from the developers; a sidecar proxy is colocated with each service running on localhost. A service will talk to its sidecar, which in turn talks to the sidecar of another service (instead of directly calling the second service) in order to perform service discovery, fault tolerance and implement tracing.

Envoy is an out of process architecture supporting the following capablities:

L3/L4 filter architecture like a TCP/IP proxy
HTTP/2 based L7 filter architecture
Service discovery and active/passive health checking
Load balancing
Authentication and authorization
Observability

Envoy can used as a middle proxy, service proxy, and as an edge proxy. An edge proxy is deployed on the internet gateway (point of external ingress) and takes care of concerns like service discovery and load balancing.

Klein also discussed the developer experience after they started using Envoy. They provide per-service auto generated, consistent dashboards for all services to help with troubleshooting issues. Dashboards include links to interesting data that they can click to get more details. Developers can click on a dashboard and navigate to a trace UI to see which parts of the application are generating longer response times. The Lyft team currently has 100% trace coverage without any gaps between services. Each incoming client request gets a unique request identified which is used for correlation of logs. The dashboard also provides a service-to-service communication overview, with a drop down for every service; developers can select the caller and callee services from the teamplate to see where the errors are occurring. The global health dashboard that supports 20K hosts at Lyft is often the first stop to view the health of the system.

Klein continued on to describe that the Envoy based configuration management solution is based on the Discovery Service (xDS) API. Some of the discovery APIs are Listener Discovery Service (LDS) and Cluster Discovery Service (CDS). They support a split architecture where they use legacy service for service discovery and the new solution for the new services. Lyft's current Envoy deployment includes 100's of microservices, 10k of hosts, and five to ten million mesh requests per second (RPS). All edge, service to service (StS), and the vast majority of external partners are part of this deployment.

From an adoption stand-point, Envoy is a community project that was released as open source in September 2016. Klein suggested developers should use Envoy because of its support for quality and velocity, extensibility, and an eventually consistent configuration API.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the QCon Software Development Conference topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter