BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Monolith Decomposition Patterns

Monolith Decomposition Patterns

Bookmarks
49:44

Summary

Sam Newman shares some key principles and a number of patterns to use to incrementally decompose an existing system into microservices. He covers patterns that can work to migrate functionality out of systems hard to change, and looks at the use of strangler patterns, change data capture, database decomposition and more.

Bio

Sam Newman is an independent consultant specializing in helping people ship software fast. He has worked extensively with the cloud, continuous delivery, and microservices and is especially preoccupied with understanding how to more easily deploy working software into production. For the last few years, he has been exploring the capabilities of microservice architectures.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Newman: We're going to be talking today about microservice decomposition patterns, hence the kind of nasty jellyfish type thing because they're sort of these tangled entities that also sting us and might kill us. That has a lot in common with the average enterprise microservice migration. As Nicky said, I wrote some books, I do some consulting and advisory work. You can find out about the work I do at my company over at my website. I won't talk more about it now. I've got a brand new book that is out as of the end of last year called "Monolith to Microservices," which is out now. If you want to find out more information about what we're going to talk about in this talk, the book is available.

We're here to talk about microservices or more specifically how we get to them. I think a lot of people are running around doing microservices, "I want to microservice, you want to microservice, we want to microservice. We're going to roll around in the microservices, or make sweet microservice enterprise digital transformation together." You scratch a little surface deep on any digital transformation that's going on around the world, and microservices are just below the surface. You know that digital transformation is a big thing right now, because any airport lounge in the world right now has adverts of one of the major IT consultancies selling you on digital transformation, be it Deloitte, DXC, Accenture, or whoever else. Microservice is all the rage. You want microservices. I, of course, also want microservices.

When I talk about microservices, though, I kind of focus a little bit initially on this major property, not on the technology we use to implement them, which is quite interesting, but I focus more on the outcome. Why are microservices an interesting architectural choice for us? There are lots of reasons why we might pick a microservice architecture, but the one I keep coming back to again and again and again, is this property of independent deployability. There's a piece of functionality, a change that I want to make to how my system behaves. I want to get that change out as quickly as possible. In this situation here, I want to make a change to that shipping service. I should be able to deploy that shipping service into a production environment, release that functionality, when appropriate, to my customers, to my users of my system, without having to change the rest of the system. This all helps us sort of ship software more quickly. It helps our teams work in a more autonomous fashion as well, rather than this kind of idea that the whole software ecosystem that we own is like a giant fungus that we can't grapple with. Instead, we're looking at being able to make targeted changes where appropriate.

We compare these architectures to the monolith. Of course, here we have a very nice monolith. I think this is one of the Standing Stones up in Orkney. You should go to Orkney if you can, loads of fantastic monoliths there. We have this vision of the monolith as being this sort of single, impenetrable block, which no change can be made to. It has become the worst thing in our lives, is the millstone around our necks, apparently. I think that's grossly unfair. Ultimately, monolith in the last two or three years has now become a replacement for the term we used to use before, which was legacy. This is a fundamental problem, because some people are now starting to see any monolith as being their legacy and therefore something to be removed. I think that's deeply inappropriate.

Types of Monoliths

Monoliths actually come in multiple shapes and sizes. When I talk about a monolithic application, I'm talking about the monolith as a unit primarily of deployment. We can think about the classical monolith, which I think many of you have in your head, which is all of my code is packaged together into a single process. I've got all my stuff, maybe it's a WAR file in Tomcat, maybe I've got a PHP-based application, but all of my code is packaged together into a single deployable unit, which talks to a database. This is a very simple idea. It's worth bearing in mind that this is a distributed system. It's a quite simple distributed system. A distributed system is one where your system consists of more than one computer talking to each other over non-local network. In this situation, we have a very simple distributed system. All of our code is packaged together in a single process. Importantly, all of our data is in one big, giant database, something which can cause us much pain, suffering, and anguish as our lives move on.

We, of course, have variations on this single process monolith, which would be what some people would call the modular monolith. The modular monolith, of course, is using cutting-edge ideas from the early 1970s around structured programming, which some of us are still getting to grips with. Here we have taken our single process monolithic application, and we have broken it down into modules. Those modules, if we get our module boundaries right, can be worked on independently. The process of deployment is inherently a statically linked approach. We have to sort of link all those modules together to make a deployment. Think about a Ruby application consisting of lots of GEM files, NuGet packages being packed up, JAR files being linked together via Maven. I would spit at this point, but I think that's pretty bad in the current viral climate.

So we bundle all of this stuff together as part of our deployment. We still have a monolithic deployment, but a modular monolith has some significant benefits to it. Because we've broken our code down into those modules, this does give us a degree of independent working. It kind of increases the surface area of the system. It can make it easy for different teams to work together, and approach and address different aspects of the system. I think this is a highly underrated option. The problem with this is that people tend to not be very good at defining module boundaries, or more to the point even if they are good at defining module boundaries, they're not very good at continuing to have discipline about how module boundaries are formed. What tends to happen is a lot of modular monoliths I see, unfortunately, this is not a problem with the concepts of structured programming or modularization, they still descend into that sort of single big ball of mud problem that we have.

I think for many organizations I work with, actually, they'd be better off with a modular monolith than they would with microservice architecture. Half of all the clients I've worked with over the last three years, I have told them, "Microservices are not for you." Some of those clients even listen to me. For many of them, if you could find a good way of defining these module boundaries, it's good enough, right? You've got a much simpler distributed system, you get a degree of independent autonomous working. I think longer term, as our runtimes continue to have a better concept of what a module is, you might find more people using these kinds of architectures. If you look at the sort of properties of modules in Erlang, for example, they're really impressive. I think give it another 15 or 20 years, maybe Oracle will eventually get Java to the point where it has the same quality of module system that Erlang did 15 years ago. We can only hope. At that point, we might have a system that allows for proper hot deployment of modules into a runtime system, which could yield some significant benefits.

We have variations on the modular monolith. This one looks a bit odd, but has been something that I proposed a number of times for a certain startup-type organizations. I think microservices are not a good choice, in my opinion, for most startups. In this situation here, we've taken this modular monolith idea, and rather than having that single monolithic database still backing it, we've broken that down instead. Now, we're looking at storing and managing the data for each module in isolation. Now, this looks really odd, but this ultimately is a hedging architecture. This is someone saying, "I think I might want to do microservices in the beginning. I recognize that one of the most difficult things I'm going to do, when I decompose a monolithic architecture, is dealing with the data tier. What I'm going to do is in advance come up with what I think is going to be my sort of separate data pools linked to those modules. The idea being if I'm working on module C, I have full ownership and control over the data associated with module C. And in the future, if module c becomes a separate service, it should be an easier time to migrate it".

The very first time I saw this sort of pattern, it was actually an old colleague of mine, Peter Gillard-Moss, when I was still working at ThoughtWorks. He came up with this for an internal system we were working on. He said, "I think this could work. We're not sure if we want to do services, maybe it should be a monolith." I said, "Look, give it a go, see what happens." I spoke to Peter, I think, last year, so this is about six years on, they still haven't changed. It's still running quite happily. They have different people working with different modules, and having the data separated even at that level gives them significant benefits.

Now, of course, we could come on to the worst of the monoliths, the distributed monolith. There should be some sort of dark ominous music at this point. The distributed monolith is a more distributed architecture. Our application code is now running on separate processes, we're communicating together. For whatever reason, we have to deploy the entire system together as part of a lockstep release. Often this can occur because we've maybe got our service boundaries wrong. We're smearing business logic all over these different layers. We've not listened to the messages around coupling and cohesion. Now, our invoicing logic is in 15 different places across our services stack. We're having to coordinate changes between multiple teams to get anything done. When you start seeing an organization where you've got lots of cross-cutting changes going on, that's often a sign that either your organizational boundaries are in the wrong place or your service boundaries are in the wrong place.

The problem with a distributed monolith is that inherently you have a more distributed system, and so that you have all of those associated design runtime and operational challenges. You also still have all of those inherent coordination activities. I want to send my thing live, but I can't. I've got to wait till you've done your change, but you can't, you're waiting somebody else. Now, we're having, "Ok, well, on the 5th of July, we're all going to go live. Is everybody ready? Three, and two, and one, and deploy." Of course, it all goes fine, doesn't it? We never have any issues with these types of systems.

If you have an organization that has a full-time job, which is a release coordination manager or something along those lines, chances are you have a distributed monolith. Because coordinating lockstep deployments of distributed systems is not fun. We end up with a much higher cost of change. The scopes of deployments are much larger. We have more to go wrong. We also have this inherent coordination activity, probably not just around the release activity, but also around the general deployment activity. Set aside agile, just look at lean manufacturing. What you learn from lean manufacturing with only a cursory examination of that is that reducing handoffs is key to optimizing for throughput. If I'm having to wait on somebody else to do something for me, that creates wastage, it creates bottlenecks in our throughput. If you want to ship software more quickly, reducing handoffs, reducing coordination is key. Distributed monolith, unfortunately, tend to create an environment in which that coordination just has to happen. Sometimes it's not where your service boundaries are. Sometimes it can purely start from how you do your software development process.

Some people misunderstand fundamentally the release train. The release train was always considered to be a remedial release technique, not an aspirational activity. Not, "One day, you too can jump on a release train." No, it was like training wheels on a bike. If you're helping an organization move to a continuous delivery, you would pick something like a release train. The idea behind a release train is you say, "On a regular basis, maybe every four weeks, all of the software that's ready goes out the door. If your software isn't ready, it gets on the next release train." Actually, for many organizations, this is a step forward, and it's a good thing. The idea with the release train, you just eventually get rid of it. You're supposed to increase how quickly the release train leaves and eventually get rid of it altogether. All too many organizations, though, adopt the release train and never change.

If you've got a bunch of teams all working to the same release train, every four weeks, all the software, we've got this ready, all goes together, then you've suddenly got lots of services being deployed at once, as each release train leaves. This is a real issue. If you're practicing the release train, one of the really important things you should try and do is at the very least, break those release trains down so that they're per team release trains. Allow separate teams to set when their train leaves the station. Ultimately, you should get rid of these things. This is a stepping stone towards continuous delivery.

Unfortunately, some excellent, really good efforts around marketing agile, have codified the release train as being the ultimate way of delivering software. We know they've done that, because in the safe diagrams, you'll see many corporate organizations. There is the word "release train" behind A1 laminated picture on your wall. This is not good. Safe, whatever other problems you might have, the release train is a remedial technique. It's training wheels on your bike. You're supposed to be moving forward to continuous delivery. The problem is, if you stick with these release trains for too long, you will end up with a distributed monolith for your architecture, because you get used to this idea of all of your services get deployed together. Just be aware of that. It may not happen overnight. You might start it off with an architecture which could be deployed independently, but if you stick with this for too long, you won't have it anymore, because the architecture will just start to coalesce around those release practices.

Ultimately, the distributed monolith is problematic because it has all the complexity of a distributed system, but also the downsides of a single unit of deployment as well. We would really want to move past this to better ways of working. Just be aware of that. The distributed monolith is a tricky thing to deal with. There's lots of advice out there about how to deal with it. Sometimes the right answer is to merge it back into being a single process monolith. But if you've got a distributed monolith today, your best thing is to work out why you've got that, start moving towards parts of architecture being independently deployable before you start adding any new services. Adding new services into that mix is likely going to make your world much more difficult.

Microservice Architecture

Coming back to our microservice architecture, we want this property of independent deployment, our independent deployability. I want to be able to change my service and deploy that into production without changing anything else. This is the golden rule of microservices. This is the main rule of microservices club. We want to do this. This is lovely. Look, isn't it great? On a slide, that's really easy. In real life, it's a lot more difficult to make this happen, especially given that most people don't start with a blank sheet of paper. This is the state of the world, the vast majority of the people in this audience have a system that they feel to them is too big, and they need to make it small, need to break it apart. Going from sort of this to a microservice architectures, almost by definition, the thing I've got right now is too big. How on earth do I know where to start?

Now, I've spoken before about the importance of things like domain-driven design. Domain-driven design has some great ideas in it that can help us find our service boundaries. Not always, but often when I'm working with organizations that are looking at microservice migration, the very first thing we would start with is actually performing some kind of domain-driven modeling exercise on the existing monolithic application architecture. The idea is to say, "What are the things that happen inside the monolith? What are the units of work from a business domain point of view that we can find"?

Although it might just look like a big, giant box, really, when we apply it to a domain-driven design sort of model and project a logical model onto that monolith, we realize, "No, we've got things in here around order management, and invoicing, and loyalty, and PDF rendering, and sending notifications to clients, and recommending new items to our customers." These things exist inside that monolith. Now, our code is probably almost certainly not organized around these concepts. But when we think about it from the point of view of our users, and we think about it from the point of view of a business domain model, these concepts exist in our code. For reasons I won't go into now, those business domain boundaries, often called bounded contexts in the domain-driven design speak, become our kind of units of decomposition.

The first thing you want to do is say, "Where do I start? What are the things I can prioritize? What are the units of work I've got in here?" Here, I can start saying, "I've got some order management, I've got invoicing, I've got notifications. Which piece should I start with?" One of the other things you'll get coming out of a domain-driven design exercise is a sense of sort of how these things are related. Hopefully, you'll come up with a directed acyclical graph of dependencies between these different pieces of functionality. If you've got a cyclical graph of dependencies, you have to do some more work. In this situation here, I can see lots of things end up depending on notifications on the ability to send various forms and notifications to our customers. That seems to be quite a core part of our domain.

I can start already asking questions about it, "Based on this understanding, should I extract notifications first?" [inaudible 00:17:51] first thing I start with, and I can look at it purely through this lens. On the face of it, I might say, "Look, notifications is used by lots of things and therefore if microservices are better, then extracting something that's used by lots of parts of my system will make more things better. Maybe I should start there." Another part of me is looking at all of those inbound dependencies and those inbound lines, and thinking, "Hang on a minute. If this is a monolithic architecture, and there are loads of things that are calling out to this, how am I going to detangle this from my system? How am I going to sort of rip it out of the existing monolithic architecture?" On the other hand, I start looking at invoicing or order management, concepts which exist in that monolithic system, but seem to be more self-contained. They're likely going to be easy things to decompose.

Now, of course, there's something inherent in what I'm talking about here. I'm talking about, "Where do I start? Which piece do I start with?" This speaks fundamentally to an incremental approach to decomposition. We'll come back to that in a minute, but before we do, I want you all to really take this next message to heart. The monolith is not the enemy. I want you to really reason about that. It's not, right? This is not the problem that you've got. People see any monolithic system as being a problem, "I can't do that [inaudible 00:19:12] microservices." This is one of the most concerning things I've seen over the last couple of years, is the fact that microservices seem for many to now be the default choice. "Of course we're going to do microservices." Why?

Some of you have got gray hair in the audience, I can see, for those of you who still got their hair. Some of you may remember an old saying, "Nobody ever got fired for buying IBM." It was this idea that because everybody else was buying IBM, you too might as well buy IBM, because if it turned out the things you bought didn't work for you, it can't be your fault because everybody's doing it. You don't have to stick your neck above the parapet. Now that everyone's doing microservices, we have the same problem. Everyone's going, "Microservice. Microservice. Microservice." It's good for me, I write books on the subject. I win. That might not be good for you.

Fundamentally, it comes down to what problems is it that you're trying to solve? What is it you're trying to achieve that your current architecture doesn't let you do? Maybe microservice is the answer or maybe something else's. It's also really important that you understand what it is you're trying to achieve because without this, it's going to be very difficult for you to understand how to migrate your system. Because, fundamentally, what you're trying to do is going to change how you decompose a system and how you prioritize that work.

When we think about microservice migrations, the metaphor I try and use is, it's not like a switch. It's not an off or an on state. It's not like zero and one. It's more like a dial, you're turning a dial. As you start adopting microservices, you turn that dial up and you add one or two services. You want to see how they work. Does it work for you? Does it give you what you need? Does it solve the problems you've got? And if it does and you like them, you can keep turning that dial. The idea is you turn that dial up, you deploy some services, you learn from that, if it works, you keep turning that dial.

What I see a lot of people do, though, go, "I think microservice would be good." "How many of you have?" "[inaudible 00:21:05] 500." "We'll crank that dial around, and then we'll plug the headphones in and see how the volume is." That's a great way to blow your eardrums. That's a bad idea because you just don't know the problems you're going to face, the things that aren't going to hit you on your developer laptop. They're going to hit you in production. When you just deployed, gone from a monolithic system to 500 services, all of those issues hit you all at once. How far along the spectrum should you be? Do you want one service, two services, five? Do you want to be like Mondo and have like 800 or 1,500 services? That works well for them, it seems. Might not work well for you. How do you know how far you need to turn that dial? That's your choice, but this idea of turning that dial is important. You want to start off your migration, you need to pick your first few. Get those deployed, get them running in production, learn from that experience. Work out where your pain points are, address them and then turn them up because this is the real problem.

You are not going to fully appreciate the sheer terror, horror, pain, suffering, anguish, and expense of running microservices until they're actually in production. This is where the vast bulk of the nasty problems hit you. If your biggest issue is a developer whinging about they haven't got enough RAM to run all the microservices on their laptop, you're doing quite well, but it might also mean you're not in production yet. This is what's going to hurt you. You need to bring that learning forward as quickly as possible. The problems each of you are going to face are going to vary on so many different factors.

You want to get something from your monolithic system, you want to extract some functionality, have it talk to the monolith, integrate with the monolith, and do that as quickly as you possibly can. This is really important. We don't want to do big bang rewrites anymore. We used to go about do this. When you deploy software every year to your customers, every year to your users, you had a 12-month window in which you could say, "We've treated our existing system so badly, it's impossible to work with, but we've got 12 months until the next release. If we try really hard, we can completely rewrite the system, and we won't make any of the mistakes we made in the past, and we'll have all the existing functionality, and we'll have a lot more functionality besides, and it's all going to be fine." That was never true when we were releasing software every year. I don't know how we justify it now when people expect software to be released, what, on a monthly basis, weekly, daily basis? To paraphrase Martin Fowler, "If you do a big bang rewrite, the only thing you're certain of is a big bang." Now, I love explosions, I love explosions in action films, but not necessarily in my IT projects. We really need to think a bit differently about how we make these changes happen.

Different Patterns of Taking Existing Monolithic Application and Moving it to Microservice Architecture

I'm a big fan of sort of incremental evolution of architectures. We shouldn't see our architectures as fixed, unchanging things. We need patterns that help us change systems in incremental ways. I'm going to share with you for the rest of the talk a few different patterns and few different ways of taking existing monolithic application and moving it to microservice architecture.

One of the first ones we could start with is a thing called the strangler fig application pattern. This particular picture here is a picture of a tree with a fig wrapped around it. This is in a rainforest in Queensland. I used to live in Australia. I love the place, it's fantastic, but Australia is a dangerous place. The weather wants to kill you, the sun wants to kill you, the things on the land, they want to kill you. The nine most poisonous snakes and spiders in the world are in Australia. Everything in the sea, they really want to kill you, sharks and jellyfish. The people and the food is nice. Even the plants can sometimes have quite vicious names.

This what you're seeing here is a vine that's wrapped around a tree, it's actually a type of plant called a strangler fig. The way these strangler figs grow is quite interesting. They basically take root in the canopy of trees and they send tendrils down around the tree and wrap themselves around the existing structure. By themselves, strengthen fig couldn't get up into the canopy of these forests to get enough sunlight. Rather than trying to become a sapling and grow up like a normal tree would, it instead wraps things around existing structure. It relies on the existing height and strength of the tree. Over time, as these figs become bigger and more mature, they may be able to stand by themselves. If the underlying tree dies and rots away, you're often left with a hollow column in the middle. When you see these things in real life, these are modern-day fig and stuff, they look like wax has been dripped around other trees. Really disturbing looking stuff.

This idea is really useful. When it comes to thinking about application migration strategy, we can use this pap. The idea is we take an existing system that does all the things we want it to, our existing monolithic application. We start to wrap our new system around it. In our case, that's going to be our microservice architecture. There's kind of two key pieces to implementing a strangler fig application. The first is what's called asset capture, and that's the process of identifying which functionality is it that we're going to migrate. Finding what that box of stuff is that we're going to move. Thinking here logically of moving this functionality into a new place, in our situation, that's going to be into a microservice architecture. Then we need to be able to divert calls. The calls that used to go to the monolithic application is instead going to have to be diverted to where the new functionality lives. If the functionality hasn't been migrated, those calls are not diverted. It's pretty straightforward stuff. There are lots of different ways to implement it. I'm going to give you the simplest one, and that's using a good old fashioned bit of HTTP.

Before I forget, when we talk about movement of functionality, some people get a bit confused about this. You might be really lucky, and you might be able to copy and paste the code. You might say, "I'm going to create an invoicing service, and look, all of my code is in a nice box in my monolithic code base called invoicing. I copy and paste it into my new service." I would argue if that's the state your code base is in, you probably don't need my help because you've already got a nice code base to work with. More likely, you're going to have to go scurrying around your system trying to drag all the bits of invoicing together. It's bringing me some pre-refactoring exercises. Maybe you can reuse that code, but in that case, it's going to be a copy and paste, not a cut and paste. We want to leave the functionality in the monolith as well. We'll come back to why in a minute. More likely, often people will do a little bit of a rewrite. They'll say, "The thing I want to extract is the invoicing functionality. I'm just going to rewrite the invoicing functionality." That tends to be a bit more common. If you're lucky, you might be able to actually reuse that code.

Coming back to our example of a strangler fig implementation. In this situation here, we've got a monolithic system, which is being driven via HTTP. In this situation, this could be a headless application. We could be intercepting this, maybe an API boundary, might be where we're intercepting calls underneath the user interface. What we need is something that can allow us to redirect calls. And so we're going to make use of some kind of HTTP proxy. The reason HTTP works so well as a protocol for these kinds of architectures is because it's extremely amenable to transparent redirection of calls. I make a call over HTTP, it can be diverted to lots of different places, and I, from a client point of view, do not care. There's loads of software out there that can do this for you, and it's extremely simple.

The very first thing you would do is you put a proxy between the upstream traffic and your downstream monolithic system, and you would do nothing else. You would deploy it into production. At this point, no calls are being diverted. Hopefully, this should work. If it doesn't work, you can find that out why now, but do this in production. One of the things we worried about a little bit here is the quality of your network. We've added a network hop here. Calls usually go straight into your monolithic system. Now, they're going via your proxy. Latency is the killer in these situations. Now, hopefully, adding a complete path through proxy in a separate network hop should only add a very small number of milliseconds overhead to your existing calls. Less than 10, that would be great. If it adds like 200 milliseconds of latency but adding 1 network hop, you're going to need to pause your microservice migration because you've got big issues that need to be solved. You might think, "Unlikely".

It happened to me. I was working down in London. This is about 10 years ago now. We had a real performance issue with our software, these two services that were talking to each other. They were both in a London data center. In fact, they're even in the same rack in the same data center. We had these huge latency spikes across, and we couldn't work out what was going on. We thought it was our software. Eventually, we found out that for reasons known only to the networks team, that all traffic between these two services that were in London was being routed via Luxembourg. Now, if that's your network configuration, this kind of architecture is going to be an issue. By putting the proxy in place, we allow us to identify those deltas and spot the problem with that quite quickly. If it's all good, you move on. Fantastic. If it's not, you can back it out straight away and dive deep into what the problems are.

The next thing we're going to do is to start working on our brand new service. I've deployed my new invoicing service. I'm starting to work on that functionality. I would deploy that into production. I can do that safely. It's not being used. We got to separate these ideas in our head. I would actually encourage you, if you've got a new microservice, the first time you've done microservices, you want to be deploying that into production on a regular basis, make sure your deployment mechanism works. You can be testing that service in isolation as you're adding the functionality. It's not released to your customers yet, to your users. It's in the production environment. You can get it hooked up to your dashboards, you can make sure the log aggregation is working, whatever else you want to do. When you're ready, when you think the functionality is equivalent to the old system, you just reconfigure the proxy to divert calls from the old functionality over into your new functionality. That's it. It's quite straightforward. Now, I have a sidestep data. We'll come back to data a bit later on.

The example here I'm using is HTTP based, but I've seen this work with FTP. I've done this with message interceptors. I've done this with fixed file uploads. We basically insert the fixed file, strip out the stuff that you want for your new service and pass the rest on. It's a really simple technique, and it works surprisingly well in a large number of situations. The key thing here is that we've made one small step, which is one service. Even extracting that one service itself can be broken down into lots of little steps. That's getting the skeleton service up, that's implementing the methods, testing it in production, making sure it's working, and then you deploy the release. The nice thing is, this functionality is up here in our new service, we haven't removed from the monolithic application yet. If we have a problem, we hit an issue in production, we've got an extremely fast remediation technique, we just change the proxy configuration, or divert the traffic back to the monolith because the functionality is still there.

This is and should be a true refactoring. Refactoring is where you change the structure of the code, but not changing the behavior. The functionality here should be functionally equivalent. We should be able to switch backwards and forwards at will until we're happy that it's working properly. This is also why you're doing this migration, you probably wouldn't be adding new functionality at the same point in time. You can kind of chip away at this, it's a nice process.

Now, that's going to work great. If we look up coming back to our directed acyclical graph of dependencies inside our monolith, that would work quite well with something like invoicing or order management, pieces of functionality that are likely going to sit higher up in your call stack. What about something like the ability to reward points for loyalty or maybe the ability to send notifications to your customers? There's not a call that comes into the monolithic system that says, "Send an email to Sam about his order," or, "Let Sarah know she's awarded some points." That's not what happens. Instead, the call that comes into the monolith is "Place Order," or "Pay invoice." As a side effect of those operations, we might award points or send the email. As a result, we can't intercept calls to, say, loyalty or notifications at the perimeter of our monolith, we're not able to do that. We actually, at this point, have to go inside the monolith itself to make those changes happen. Let's take a look. Imagine we're going to extract notifications. We've got all these inbound links. How can we extract that piece of functionality in an incremental way without breaking the rest of the system?

Here's another technique that can work really well, and that's a technique called branch by abstraction. Branch by abstraction is a pattern you may have heard of in the context of trunk based development, which, I think, is a very good way of developing software. Branch by abstracting is also incredibly useful as a pattern in this context as well. The way it works is that we're going to basically create a space in our existing monolithic system, where we can coexist two implementations of the same piece of functionality. In many ways, this is true Liskov substitution principle. This is a separate implementation of exactly the same abstraction. Here's an example of how we do this. We've got our existing code, and we're going to extract maybe the notifications functionality.

We've got all this notifications code is scattered all over our system. The very first thing we want to do is get all that notifications functionality, it's going to go into our new service. We're going to get it sort of hidden behind the attraction point. What we want to do is have our invoicing code, our orders code that is going to call out to this functionality when it comes by some abstraction point. Maybe we create a brand new notifications interface. The only implementation of that notifications interface we have is a class that has all the existing functionality. That functionality lives inside the monolithic system. All of our calls out to SMTP libraries, and calling out to Twilio to send SMSes, or sending Tick Tock messages. I think that's how that works. It would all get bundled into this class.

At this point, all we've done is we've created a nice abstraction point in our code. We could stop here, and we've made our code base nicer and more testable, which doesn't sound like a bad thing to do anyway. This is good old fashioned bit of refactoring. We've now created, though, a situation where we can change the implementation of notifications that invoices uses or the orders uses. This is step one, this is a refactoring effort that could be done over a period of days or weeks, while you're doing other stuff like actually shipping features. Again, all these refactoring patterns are things that can be folded in while still shipping functionality as well.

Then we start working on our brand new implementation. We start creating our new implementation of notifications. This is kind of going to be split into two bits. We've got the implementation of the interface that lives inside the monolith, but that really is just going to be client code calling out to your new notification service. We can be working on these implementations, we can be checking them in, we can be deploying them because, again, we can deploy them safely because they're not being used. This means we're integrating our code more frequently, reducing the merge effort, making sure everything works.

Once we're happy that our new service calling implementation works, all we need to do now is switch the implementation of the abstraction we're using. This is what we use feature toggles or feature flags for. You could use awesome tools and platforms like LaunchDarkly, or split.io for this sort of stuff, or text file, whatever you want to do. There's lots of amazing feature toggles, so runtime, build time, deploy time, those sorts of things. If I have a problem, I haven't removed the old functionality yet. I can flick that toggle back and go back to the old functionality that I'm using. Again, this is a small step, this is one service, but this small step is also broken down into lots of smaller steps. We're trying to get to production as quickly as possible in all of these steps.

Once I'm happy, [inaudible 00:36:28] working, I then, if I want to, it is optional, clean up the code. Maybe I'm going to remove the flag once it's no longer needed. I could even remove the old code. And it's really easy to remove the old code because you've just spent some time earlier putting all of that code into a nice box. Now you're just going to delete that class, it's gone. Then look, you made the monolith smaller and everyone feels good about themselves. Again, this is just this branch by abstraction pattern, incredibly useful.

Now, in terms of efforts or restructuring refactoring code, I can strongly recommend this book, "Working Effectively with Legacy Code" by Michael Feathers. So Michael's definition of legacy code is code without tests. It's loads of great ideas about how you find these abstractions. He calls them [inaudible 00:37:12] a lot of scenes. How you create those abstractions safely in a code base in a way without disrupting existing system. It's a really excellent book. I think the original version was written for C++, I think, but the code examples of this have been done in Java, Ruby, .NET, Python, and things. It's well worth the read. Look, even if you don't go to microservices, those first couple steps, just creating that abstraction point is probably going to leave your code in a better, more testable state anyway.

I've sort of said earlier that it's a good idea to not remove the old implementations too quickly, and that there are actually some benefits to having both implementations there at the same time. In this situation, we've got both implementations live in the monolith at once. And that opens up some really interesting approaches to how we deploy and roll out our software more specifically. Because rather than calling the old implementation or the new implementation, why don't we call both? It's called a parallel run. In this situation here, when a call comes into the abstraction point, I'm going to call both implementations. Why would I want to do that? It's comparison. This should be a refactoring. A refactoring is where we change the structure of the code but not the behavior. What we want to make sure is that our new microservice base implementation has the same functional equivalency. If I execute both copies of that functionality, I can compare them. I execute both and I compare the results.

I've done this a few times. We did this at an organization that was doing these interesting financial instruments. This is pre-GFC, so it was very interesting. We had to make sure we were generating exactly the right numbers from the old system and the new system, because the numbers we generated directly impacted the bonuses paid to the traders at the end of each quarter. They really wanted to make sure those numbers were exactly the same.

You just execute both implementations and you compare the results. Now, you have to consider what is your source of truth here. Because I wouldn't necessarily want to make both of these implementations my source of truth, because in the case of notifications, that would result in me potentially sending two emails to people, but we only want to send one. This technique can be incredibly useful because you get a direct live comparison, and not just of the functional equivalency, but also the acceptable non-functionals. It's not just, "Did I create the right email? Did I send it to the right dummy SMTP server," but also, "Did it did the service that I've created respond quickly enough? Am I getting an acceptable error rate? Am I getting a decent 95th percentile response times or whatever else it is?" We're actually able to compare both.

Normally what you do here is your new implementation would be run side by side with your old, but the old would be the trusted implementation. That's the one's the results you're going to use. Once you trust the old implementation, you say, "Let's now trust the new implementation. Let's keep running them side by side for a period of time." Eventually, we can get rid of the old one.

GitHub do this a lot. They actually have actually created a library called GitHub Scientist. This is a little Ruby library for wrapping different abstractions and scoring them. They sort of uses wherever they're refactoring critical code paths in your application. That should give you a way of doing this live comparison stuff. GitHub Scientist has been ported to a bunch of different languages now, including, inexplicably three different ports for Perl. Clearly, these parallel runs are a big thing in the Perl community. If you're interested in doing parallel runs inside your application, there's loads of good advice out there how to do this.

You can always do this at different levels. When I first did this, we didn't do a live comparison, we did an offline comparison. You ran the both subsystems in parallel. We did an offline overnight comparison of the results generated and we sent an email. Every morning I would come in with an email, an XML document telling me all the things that we got wrong, which was quite good. You're spotting those things before your customers spot them, before your users spot them is really important. I've mentioned a couple of times this idea that we're going to sort of make a change where you start working in the production environment, but without releasing that functionality to our customers yet. That gives us our ability to make sure it's working and spot problems before the end users of our software spot these problems.

Fundamentally, what we're trying to do here is separate these two concepts in our heads that previously had been bound together. We're trying to separate here the idea of deployment from the idea of release. Traditionally, we would consider these two activities to be one and the same. We take our software, we deploy it, and the act of deployment is the same as releasing our software to our production users. This is why everyone's scared about anything happening in production. That's how production becomes this gated environment.

We can separate these two concepts. The act of deploying something into production is not the same as releasing something to our customers. This is really the underpinning idea behind something that people are now calling progressive delivery, which is sort of an umbrella term for a bunch of different techniques. Think about Canary releasing, A/B testing, parallel runs, blue/green deployments, and dark launching. This idea that we can be smarter about getting our production or software out quickly, but we don't have to get to all customers or any customers. We can get it out there. See if it works, test it ourselves, bear that pain ourselves.

If you look at overview of progressive delivery, James Governor from RedMonk has got a nice overview of this over on the RedMonk blog. It's a really interesting idea. If you can think about ways you can separate deployment from release, it allows you to de-risk deployment so much better. It makes you be much more brave about making changes. You'll be able to make more frequent releases and those releases will have much lower risk. Have a look into progressive delivery, but just really the most important thing is here, the active deployment is not the same thing as the active release, and you can control how that release activity happens.

Data

Skipped over data, I'm going to try and cover data off in six minutes and seven seconds. Here we have our existing monolithic application. We've got our data locked away in our system. We've decided that we're going to extract our invoicing functionality, but "oh, no" we need data and the data is over here. What do we do? Option number one is just go get the data. Now, this is for a short period of time acceptable. In a situation where you're maybe not sure which is your trusted implementation, you're switching between is invoicing live in the monolith, or is invoicing live in the microservice, you likely want data compatibility and consistency across those two implementations. For a short period of time, this is acceptable, but long term it's not and this is because one of the golden rules of databases after independent deployability is thou shalt not share databases. Fundamentally, this is down to the coupling issues that it causes. Here's my shipping service. I've got some data in my database. I allow somebody else to access my data directly. Now I've sort of exposed my internal implementation details to an XML party. Makes it hard for me as a developer of the shipping service to know what I can change safely. I haven't actually got any separation between what is shared and what is hidden.

Again, coming back to cutting edge ideas in the 1970s, David Parnas developed this concept called information hiding, which is how we think about modular decomposition. You want to hide as much information as possible inside the boundary of a module, or inside the boundary of a microservice. If instead, I say, "No, if you want information from me, you have to come to a well-defined service interface point." It allows me as a developer of the shipping service to have an explicit understanding about what the contract is that I expose to the outside world. As long as I maintain that contract, I can do whatever I want in my service. This is about allowing independent evolution and development of these services. Don't do direct database access, except in an extremely limited number of circumstances.

We don't want this. What are we going to do? Well, we've got kind of two options. Let's imagine we've decided that the invoicing service is now good enough to be the real source of truth for invoicing. We need to get data. What do we do? Well, first thing is what kind of data is that you want? If the data that you want is actually somebody else's data, well, at the moment, the only other people that own data is the monoliths. You come to the monolith. You say to the monolith, "Can I please have some information?" You create some kind of explicit service interface on the monolith itself, in this case, an API, and I can fetch the data I want. This works really well at one level. If it's something like, say, orders data, that would make sense. I'm not the order service, I'm the invoicing service. Orders lives in the monolith, I'm going to come to you to get the data I want. It's all really interesting, you start defining the service interfaces on the monolith to expose that information, you start to see the shape emerge of other prospective services.

It's a lot like that bit in "Alien" where John Hurt's got the alien coming out his stomach. You can start see the alien little head, alien's just kind of creeping out his stomach and it burst out, he dies. A lot of microservice migrations is just like that, but you start to sort of see the shape of this horrific entity emerging from the monolith, but nonetheless, it might help you see, "Ok, well, there's an order service just waiting to be freed from the vicious clutches of the monolith." Although in this context, the monolith would be John Hurt, and it would be dead.

Now, of course, we need to invert that situation. What if the data that you want is actually your data? What if it's invoicing data? Well, at that point, we've got to move the data over. This is where all the hard stuff happens. This is the really hard stuff. When you're taking data out an existing system, especially a relational databases causes a lot of pain, suffering, and anguish. I'm going to give you a very quick example of the kinds of challenges it can create. I'm going to throw us right into the deep end, which is how we deal with joins.

Here we have an existing monolithic application. This is for a system where we are selling compact discs online. You can tell how long I've been using this example for. We're selling compact discs online. We've got some catalog related functionality, this knows how much something costs, and it stores information in our table here. We've got the Best of Death Polka Volume 4, and the Best of Music. This is the kind of stuff we sell. Finance functionality manages our financial transactions. We store stuff in a ledger table. One of the things that we need to do is we need to generate a top 10 list of our bestsellers that week. That's actually a pretty straightforward join operation in this situation. What we would do is we would do a select on our ledger table, we'd pull back the top 10 bestsellers. We'd limit that select based on the row and everything else. That would've allow us to get the list of IDs, but the problem is, if you're doing a join out to the album to tell us what things we should be using, that's not going to work very well.

Instead, we need to go and do a join operation in the application tier. Move this over to services, we'd enter into a very different world. When we move to this sort of world now, how much I paid for stuff is over in the ledger table here. I pull my financial transactions back from this place. The items that I've sold is over in this world here. To generate that top 10 list, I'm going to have to pull back my best seller from the ledger, and then I'm going to have to go to the catalog and pull back those records as well. All of our join operations goes from being done in a relational tear up into the application tier. This becomes horrific in terms of things like latency. While I'm doing one single join round trip, I'm now doing one [inaudible 00:48:40] there to pull back the top 10 IDs. Now I'm going over to the catalog for those 10 IDs. Then I'm going to the catalog database for those 10 IDs, saying, "Please, can I have the catalog items?" Then I'm getting that response, I'm getting response back. Join operations like this are horrendous in terms of latency. This sounds like a lot of fun, right? We haven't even scratched the issues around the fact that we haven't got any data integrity in a situation, or a relational database when [inaudible 00:49:05] referential integrity.

There's loads more information about how we solve these sorts of problems out there on the internet and on my blog. The one thing I want you to take away from this talk is please buy my book. The second thing is that microservices shouldn't be a default choice. They should not be a default choice. You've got to think really carefully about if they're right for you.

 

See more presentations with transcripts

 

Recorded at:

May 05, 2020

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT