Deliveroo has grown dramatically the last years, both in terms of business and IT, and is facing a lot of technical challenges with its large monolithic application. The solution is to go distributed, but without microservices, Greg Beech noted in his presentation at the recent QCon London conference, describing their move from a monolith into a distributed system.
Beech is lead engineer at Deliveroo which was founded in 2013. They started with a typical Ruby on Rails monolith using PostgreSQL and Redis for data storage and handled the growth in business by using larger and larger databases. One year ago, they were running about 20 servers on Heroku. Currently, they are running a few hundred servers which is the largest application ever deployed on Heroku, at peek using 1800 cores and 3 TB of memory. They have grown from 10 engineers in 2015, to about 100 in 2017, working on a main codebase of 600,000 significant lines of code.
Using a monolith has been deliberate and has enabled them to quickly add features to adapt to markets needs, however now they are starting to face problems. With the current size of the monolith, they are suffering from degrading performance and increasing build times, going from seven minutes two years ago, up to over two hours today, which in turn has led to reduced development velocity. A large monolith also causes decreasing reliability, since a single problem can bring everything down.
The solution for them is to go distributed, and they are doing this by a split of the monolith into three classes of Twelve-Factor applications; Domain services, Edge services and Client apps for the UI, all supported by an event bus.
A Domain service:
- Owns a significant part of the domain. These services own a significant part of the domain that make sense cohesively, which is why Beech doesn’t like to call them microservices.
- Exposes internal real REST APIs, with hypermedia.
- Send and receive from an event bus.
- Can use other domain service APIs.
An Edge service:
- Does not own any part of the domain.
- Exposes aggregated APIs to the outside world.
- Receives from an event bus.
- Can use other edge and domain service APIs.
There are no shared data stores; each application has its own data store which no other application can access, without exceptions. Instead, all data is exposed as REST APIs, and Beech notes that this is actual REST, with hypermedia. One example is collections which they return as a set of links to the entities, instead of as embedded objects. He also notes that RPC is not allowed.
Domain services sends events through the event bus when entities are created, updated or deleted, a technique they call Representational State Notification (RESN). An event never contains a payload, only a link to the entity the event relates to. One reason for this is to avoid the bus becoming a critical source for data loss. An exception however is that non-critical immutable value objects may be sent in the messages.
Beech notes that although they have quite strong guidelines for how to build services, how to layer them and how they communicate, you can start simple and evolve into a more complex architecture as you need to. The reason for this is to enable teams to work like start-ups with their own problems and goals, to allow them to evolve their architecture as needed, and to succeed even with limited distributed architecture experience.