Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News QCon London: Scaling Microservices Architecture and Technology Organization at Trainline

QCon London: Scaling Microservices Architecture and Technology Organization at Trainline

This item in japanese

During the recent QCon London conference, Trainline’s CTO spoke about the evolution of the company’s system architecture and organizational structure over the last five years. The company had to adapt to market changes and growing customer expectations by improving the performance and reliability of its technology platform.

Milena Nikolic, the CTO at Trainline, provided an overview of Trainline’s business, revealing the company is responsible for £5 billion in ticket sales and is experiencing peaks of 350 searches per second for 3.8 million unique routes available on its platform. The main challenges for the company are API integrations and spikes in demand from users buying tickets shortly before traveling.

Nikolic went on to share three scaling stories that Trainline has experienced over recent years, starting with scaling the team's productivity. Before 2022, the teams at Trainline were shaped around the ownership of parts of the technical stack, which negatively impacted technology and business alignment and teams’ productivity.

Since then, the organization has evolved to better align technology with the product. Now, 50% of the overall technology stack is owned by the core platform team, while the rest is split into product-aligned vertical teams. Nikolic admitted that applying the Inverse Conway Maneuver may not always be immediately successful, and companies need to try out different approaches before finding the optimal structure.

The second scaling story revolved around the cost optimization efforts. A substantial portion of Tranline’s overall IT budget goes into covering the cloud infrastructure bills, so cost optimization has been crucial for the company to ensure cost efficiency. Nikolic shared lessons learned from driving the cost reduction push within the company.

She offered the attendees a few areas to look at to reduce cloud costs, including consolidating non-production environments, right-sizing service provisioning, reviewing architectural choices (containers vs. cloud functions, etc.), and removing/archiving old or unused data. Trainline’s CTO advised avoiding blanket core reduction targets to avoid forcing teams into making potentially harmful decisions in order to cut costs where little margin for saving exists.

The last area discussed was around architecture scalability. Nikolic recounted three outages that the company experienced between 2021 and 2023 and delved into the details of each one, providing her analysis of root causes. The speaker shared the learnings the company gathered over the years and advised the audience on best practices to ensure the scalability of microservice-based architectures.

Nikolic recommended regularly monitoring and reviewing long-term traffic trends instead of only focusing on release-level performance analysis. Additionally, she pointed out that fleet coordination is critical for microservices so the architecture/engineering leadership has to guide teams on retry strategies, scaling policies, and DB connection pool configuration to avoid scalability-related issues and outages.

Access recorded QCon London talks with a Video-Only Pass.

About the Author

Rate this Article