The re-architecture to SOA at Airbnb improved the performance of the services and site reliability. Faster build and deploy times led to increased developer productivity, and improving clarity and boundaries for ownership increased efficiency. Jessica Tai, a software engineer at Airbnb, presented Airbnb’s Great Migration: Building Services at Scale at QCon London 2019.
Airbnb’s engineering team grew rapidly but the ability to develop efficiently and reliably in their monolith did not scale gracefully with the team, said Tai. With hundreds of engineers contributing and deploying to our monolith, it began taking longer to deploy code (often on the magnitude of hours). The monolith was getting more tightly coupled making it difficult to navigate, code, and debug, in addition to ownership becoming further muddled. Engineer productivity was decreasing while frustration with the development life cycle was increasing.
At QCon SF 2018 Tai presented an overview of trade-offs and motivation for the SOA migration at Airbnb in The Great Migration: from Monolith to Service-Oriented. Takeaways include being prepared for a long migration journey, comparing and migrating incrementally, scaling services through standardization and autogenerated code, and investing heavily in frameworks, tools, and documents. SOA has improved Airbnb’s build and deploy times in addition to lowering latency and improving reliability. More details can be found in the InfoQ write-up Airbnb’s Migration from Monolith to Services.
Tai mentioned that service-oriented architecture and microservices have separate build and deploy processes per service. Ownership is more clearly defined to the scope and support API of the service. SOA seemed promising for improving developer productivity and velocity.
With the decomposition of the monolith, it has forced engineers to critically think about ownership of various features and sections of code, said Tai. In the SOA, each service is owned by a single team, which has designated engineers accountable for the service. Improving clarity and boundaries for ownership has been a benefit for helping the team to execute on the product more efficiently, she said.
InfoQ spoke with Jessica Tai about design principles and practices for SOA, the benefits Airbnb got from the re-architecture to SOA, and what Airbnb has done to shift to a product culture to empower migration work.
InfoQ: What design principles and SOA good practices help you to build scalable, performant services?
Jessica Tai: Some of our design principles include:
- Encapsulated access to data storage with a single service as its owner
- Services should address a specific concern
- Avoid duplicate functionality
- Data mutations published via standard events
- Build with best practices for production, including technical architecture, observability, and alerts (avoid cutting corners for "prototypes", "only admins", or aggressive deadlines)
Good practices include:
- Standardization for consistent APIs, observability, client and server functionality, plus build and deploy processes
- Invest in tools and solutions to autogenerate code instead of relying on humans to manually write boilerplate and shared frameworks
- Include reliability and robustness features in autogenerated clients and servers; fail fast
- Perform comparisons of monolith and service request life cycles asynchronously to avoid adding extra latency to production paths
- Make RPCs to dependencies asynchronously to get performance benefits due to parallelization
InfoQ: What benefits has Airbnb experienced from the re-architecture to SOA?
Tai: Our first SOA-related service was focused on core homes data and its intent was to work behind the scenes with no changes to how other engineers were reading and writing homes data in the monolith. Once we began developing a couple more core data services, we were able to reap the benefits in our presentation services, starting with our home description page presentation service.
Improved performance was one of our earlier benefits, due to creating new services in multi-threaded Java to asynchronously call other services (whereas in our Monolith using Ruby, more dependencies were executed synchronously).
The developer productivity benefits, including faster build and deploy times, came later as we were still building our service frameworks and developer infrastructure teams to support SOA.
InfoQ: What has Airbnb done to shift to a product culture to empower migration work?
Tai: For the migration to services to be complete and successful, the whole company needs to be aligned and committed -- it is not a pure technical initiative.
In order to minimally impact the speed of product development, Airbnb invested in making service building quick, simple, reliable, and robust. It would be impossible to gain momentum in all of the engineer teams if building service and coding a feature in a service were a magnitude slower than coding and deploying in the monolith.
Including migration work as high priority alongside new feature launches and celebrating their milestones has brought awareness to this massive re-architecture. Airbnb has included SOA work in our company goals, showing its commitment to seeing this long journey through to the end.
InfoQ is covering QCon London 2019 with Q&As, summaries, and articles.