ScholarPack has migrated away from its monolith backend using a Strangler Fig pattern. They applied incremental development and continuous delivery to target customers’ needs, in the meanwhile strangling their monolith.
Gareth Thomas spoke about their migration of a high traffic monolith toward microservices at Lean Agile Exchange 2020.
The important decisions that enabled the Strangler Fig pattern to work successfully was the order of approach, as Thomas explained:
As the project commenced there was no experience of writing a production Flask application (the framework chosen to replace Zope). It was decided to write a greenfield project in Flask first - a customer facing RestAPI. This enabled us to solve a large number of deployment and architecture issues without the risk of changing an existing product.
In parallel to this we developed what we know as "the wrapper". This boundary service now sits between all customer traffic and the backend services. This enables us to abstract away the changes from the user - the wrapper is transparent.
Putting the wrapper in place without moving anything out of Zope allowed them to resolve all user session and routing type issues before having to solve routing between internal services. The way domain names work with ScholarPack customers also enabled them to move them in small blocks into this new service.
From that point onwards, the migration has followed a needs-based approach, Tomas said:
Modules are selected based upon required changes rather than usage. We are very adept at keeping Zope servers running, and the majority of the risk comes from changing the codebase. Therefore, any significant features would be done by migrating a module and then adding it. All new modules or extensive new features are done in Flask. Zope still exists within the infrastructure, but it is now slow changing.
The stability of the running Zope modules results in a limited business need to accelerate the migration. Any work within the Zope codebase introduces risk, but the inverse is also true - the Zope code is battle tested and stable, Thomas said. To re-write stable features for no real benefit beyond the removal of Zope adds the risk of bugs and lost functionality, and also takes resources away from revenue generating work.
InfoQ interviewed Gareth Thomas about their migration towards microservices.
InfoQ: What made ScholarPack decide to migrate away from their monolith backend?
Gareth Thomas: The original version of ScholarPack was based upon a legacy Python framework called Zope. As technology has advanced, ScholarPack became stuck, unable to upgrade. ScholarPack and Zope were deeply entwined, and changing to a new framework was not a simple job. Moving away from Zope was a must, but we need to avoid a complete rewrite as those are doomed to failure. A microservice migration suited the requirements of the business and allowed a phased approach.
InfoQ: What approach was used for developing services and how did it work out?
Thomas: We took an API first approach to the development of the services, with a central "monolithic" API that mirrored the database of the application. This is name spaced around functionality, but is still a single service. Microservice purists are aghast, I know. But this enabled the database to remain unchanged, and prevented us from needing to implement complex data models in several services.
With this API behind them, the majority of the services generate their own HTML and send this to the customer via the wrapper.
I would probably describe this as a series of mini-monoliths, split across business modules, or Single Responsibility Services. I like to think about conceptual boundaries of responsibility, beyond which the service hands off to another service. These responsibilities are somewhat broad, and definitely arbitrary, to serve the needs of the development team.
I am firmly of the belief that there is no single correct architecture. The only real measure of success is running software in a production environment. Following that, the ease and safety of making changes, and then the ability to limit the side effects and close coupling between architectural components become important - but only in that they allow for easier development and deployment.
ScholarPack has a small number of these services, with broad areas of responsibility like "assessment", "reporting", "student management" and "parent management". In some cases these are almost monolithic products that could be potentially sold as a standalone thing. Having these separations allows for easier development, as you can move different parts of the system at a different speed, and avoid the risks of large deployments. But having quite large modules reduces the cognitive load on the technical teams. Any developer or Sysadmin can easily keep all the components that go into the delivery of a suite of features in their head. On a small team this reduction in complexity and cognitive load can help speed things up.
That said, we have left seams throughout the code; logical breaks within the code that would allow services to be split in the future. This has been helped through extensive use of Flask Blueprints and good adherence to SOLID principles that reduces coupling within the services.
Each service is forked from a starting code base called "the skeleton" that understands the permissioning systems, the frontend generation and how to communicate with the API. Shared logic is within a series of maintained and versioned Python libraries, many of which have been open sourced and are available on GitHub and PyPi.
InfoQ: How do you apply incremental development with continuous delivery?
Thomas: Each module is small and self contained. This has allowed a rapid release cycle, with pipelines driving the deployment into live.
One of the few benefits of the Zope framework is the fragile nature of the software has forced us to work in small increments, and ship in frequent small releases. Having unreleased code laying around for more than a few hours has led to incidents around deployment, like accidental releases or code being overwritten. So the philosophy has been "write it and ship it immediately".
Things like feature toggles and atomic releases were second nature. Therefore, when we designed the wrapper and the new service architectures, feature toggles were baked in from the start (if a little crude in the first cuts). Therefore, from the early days of the project code was being pushed to live within hours of being committed.
Moving to a framework like Flask enabled "proper" CI pipelines, which can perform actual checks on the code. Whilst a deployment into production is manually initiated, all other environment builds and deployment are initiated by a commit into a branch.
The aim is to keep the release cadence the same as it has been with Zope. Changes are small, with multiple small deployments a day rather than massive "releases". We then use feature toggles to enable functionality in production.
InfoQ: What have you learned from the migration?
Thomas: Some of things we learned are:
- Always check that you are following Rule 1 - is it better than it was? So many times on large projects, perfect is the enemy of the good, and we are not looking for perfection; we are looking for better. Something working, maintainable and live is better than the perfect solution that will never see the light of day.
- The best architectures emerge. I know that it is quoting the Manifesto, but it is so true. There are no text book answers to moving a legacy product. There are many design decisions that would make purists twitch, but in the context they unblocked an issue and made the system better. Those rough edges, the technical debt that we knowingly incurred to simplify our work, can now be smoothed in a CI, unit tested, phased released estate and the customers are already feeling the benefit of an improved experience.
- Choosing the correct first modules makes or breaks a project. Starting with something completely outside the "rewrite" project as the starting point allowed a lot of questions to be resolved in a low stakes way. Everything we learnt building that initial external API went directly into building the new product.
- Pivot. If an idea is not working, cut it loose sooner rather than later. We lost a lot of time because we believed a service should be in control of its own data. This resulted in horrible models that needed to be shared - the coupling was horrendous. Acknowledging that we had a legacy that complicated this approach and going with a core API made everything less coupled and increased quality and velocity.