Yelp Engineering: Using Services to Break Down a Monolith
The Yelp engineering team have stated that moving to a service-oriented architecture has allowed them to scale their development process and maintain a rapid pace of software delivery as the team and codebase has grown. This has been achieved by focusing on distributed systems education, creating a set of basic service design principles, defining a service interface specification, implementing a scalable approach to testing, encapsulating data stores within each service, and deploying a robust solution to service discovery.
The Yelp engineering blog states that the Yelp engineering team value their ability to rapidly ship code. Changes are constantly pushed out to production, and this has remained a constant as the engineering team has grown to over 300 people and the codebase has reached several million lines of Python code. One key factor in maintaining this iteration speed has been the move to a service-oriented architecture (SOA). Over the past three years the Yelp engineering team has written and deployed over seventy services to production.
The Yelp engineering blog proposes that creating a service-oriented architecture forces developers to confront the realities of distributed systems, such as partial failure and distributed code ownership. Yelp have attempted to mitigate many of these problems through the implementation and management of supporting platform infrastructure, in a similar fashion to Netflix and Twitter. However, the Yelp engineering team suggest that there is no substitute for helping developers understand the realities of the systems that they are building.
The Yelp engineering team have encouraged the dissemination of knowledge throughout the team by several techniques, including the creation of a set of basic principles for writing and maintaining services, implementation of a weekly ‘services office hours’ where any developer is free to drop in and ask questions about services, and by conducting blameless postmortems to help the engineering team learn from mistakes.
The majority of services within Yelp expose HTTP interfaces and pass around structured data using JSON, which has both advantages and disadvantages:
There are definite tradeoffs in our choice to use HTTP and JSON. A huge benefit of standardizing on HTTP is that there is great tooling to help with debugging, caching and load balancing. One of the more significant downsides is that there’s no standard solution for defining service interfaces independently of their implementation (in contrast to technologies such as Thrift). This makes it hard to precisely specify and check interfaces, which can lead to nasty bugs (“I thought your service returned a ‘username’ field?”).
The engineering team at Yelp have addressed the above issues by using Swagger. Swagger builds on the JSON Schema standard to provide a language for documenting the interface of HTTP/JSON services. Swagger UI can also be used to provide a centralised directory of all service interfaces, and this allows developers from across the Yelp engineering team to easily discover what services are available, and helps prevent duplicated effort.
The Yelp engineering blog discusses that testing within a service typically follows a standard approach, including unit testing and integration testing with mocks. However, performing tests that span services can require complex orchestration. Yelp utilise Docker containers to spin up private test instances of services, including databases. The core concept is that service authors are responsible for publishing Docker images of their services, and these images can then be pulled in as dependencies by other service authors for acceptance testing their services.
A significant proportion of the Yelp services need to persist data, and the engineering team utilise a combination of MySQL, Cassandra and ElasticSearch. Regardless of the choice of datastore, the the Yelp engineering blog states that the primary goal is to keep the implementation details private to the owning service. This approach gives service authors the long-term flexibility to change the underlying data representation, or even change the datastore completely.
A core problem in a service-oriented architecture is discovering the locations of other service instances. Yelp have utilised AirBnB’s SmartStack discovery service, which works by taking the whole problem ‘out-of-band’ from the applications by using a sidecar process. SmartStack consists of two processes; Nerve, for service registration; and Synapse, for service discovery. The Yelp Engineering blog states that each service host runs a Synapse HAProxy instance that is bound to localhost. The HAProxy load balancer is dynamically configured from the Nerve service registration information that is stored remotely in ZooKeeper. A service can then contact additional services by connecting to its localhost load balancer, which will then proxy the request to the desired healthy service instance.
The Yelp engineering blog post concludes by stating that the development of a next-generation service platform called Paasta has begun, which uses Apache Mesos in combination with the Marathon framework to allocate containerized service instances across clusters of machines. Additional details about this project will be posted on the Yelp engineering blog later in the year
More detail on how the Yelp engineering team have used services to break down a monolith can be found on the Yelp Engineering blog.