Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Migrating a Retail Monolith to Microservices: Sebastian Gauder at MicroXchg Berlin

Migrating a Retail Monolith to Microservices: Sebastian Gauder at MicroXchg Berlin

This item in japanese

In his presentation at MicroXchg in Berlin, Sebastian Gauder described how he and his teams migrated an existing food retail monolith at REWE, a large German company, into several business domains with 270 microservices, while increasing the number of teams from two up to almost 50. He also discussed the different design goals and rules they setup to make this possible.

Gauder, software architect at REWE digital, started by describing how in 2014 they started the trip by taking over an existing monolith using two teams. To be able to scale to 20-30 teams they decided to migrate to a microservices architecture, but he emphasized that the reason for this was the ability to increase the number of teams working on the system, not that it was cool from a technology perspective. By 2016 they had increased to around 20 teams and released an app supporting the whole ecommerce process. In 2018, they went live with 48 teams working on the whole system, which is the number of teams they have today. As the number of teams increased, the number of services went from 1 - 270.

When they started with microservices they made two major design goals:

  • Decentralization. The wanted units of software that teams could work on, deploy and operate, independently from all other teams.
  • Vertical boundaries. They wanted technical layers like frontend, backend and data storage handled in each service, thus avoiding technical teams for each layer working across all services.

From an architectural and domain perspective they use domains, subdomains and bounded contexts, all key concepts from Domain-Driven Design (DDD). These are then implemented using services based on a common platform. Their organisation is based on the Tribe, Squad and Team concepts from Spotify:

Architectural and organisational model

To determine the boundaries in the system they let the customer journey define the subdomains, which led to four areas within the ecommerce domain: customer check-in, product discovery, checkout and fulfilment. They also ended up with a few common subdomains that didn’t fit in any of the others: product information, back office and one for their mobile application.

Later on, they realized that the fulfilment subdomain was growing too large and therefore decided to move it out of the ecommerce domain into a new fulfilment core domain with four subdomains created according to how goods are brought to a customer: inbound for goods coming in to storage, inventory, outbound for loading goods on trucks and realization for delivering to customers. In this domain they also have some common blocks like master data and back office.

With almost 50 teams, each working on the whole technology stack ranging from frontend to data storage, within a given a context, it's crucial to make sure all services work together. To accomplish this, they have created some guarding rules:

  • Design goals (as already mentioned)
  • Architectural principles, consisting of nine basic rules around autonomy, automation and communication
  • Guides describing how to perform common tasks like handling events, REST and authentication

The autonomy principles include that a team can work and deploy independently; they should never have to wait for, or synchronize with another team. Implementation details should be hidden from other teams and failures isolated within services to make them resilient. The principles also state that for each data storage there must be exactly one service responsible.

The first team rule concerning automation is that scaling must be horizontally and done automatically. Teams should also embrace a culture of automation, automation test, deploy and operations as much as possible. They are encouraged to deploy to production early and often, but also to be able to quickly rollback, in case of errors. To enable this, services must be highly observable.

For all teams, communication is standardized and asynchronous where possible. For synchronous communication they use REST (maturity level 2, without hypermedia) and Kafka for asynchronous communication. Gauder points out three lessons they have learnt during their work:

  • Events and REST are different views on an API, and both must be implemented from the start. Events must also behave like an API and avoid breaking changes
  • Writing generic APIs is hard. Too often teams write APIs targeting specific clients, like mobile apps
  • Breaking changes must be solved by introducing new endpoints or topics

To create a frontend for all their services, they use a dynamic composition where each service provides the data it’s responsible for, data that is then composed into a fully functional web page. Gauder notes that this enables teams to deploy new functionality as they like; they don’t have to first get approval from a frontend team. To achieve this, they have created a Dynamic UI Composition (DUC) component written in Node.js. When a request arrives, a template corresponding to the route is read, data from services included in the templates is read and finally the page is created and returned to the customer. Gauder thinks this is a more DDD-like solution compared to a monolithic frontend. He also notes that after some work optimizing the parsing of data and use of caching, he believes they now have a performant web site. They have published a repository with integration patterns and later this year they hope to be able open source their DUC library.

The slides from the presentation are available. Most presentations at the conference were recorded and will be available over the coming months.

Rate this Article