InfoQ Homepage Articles Seven Ways to Fail at Microservices

Architecture & Design

Seven Ways to Fail at Microservices

This item in japanese

Lire ce contenu en français

Bookmarks

Feb 14, 2022 15 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

Microservices are a means, not a goal
Being distributed does not guarantee being decoupled
Contract tests are an important part of any microservices architecture
Decomposition needs to happen in the front-end, back-end, and integration layer, as well as in the business logic
If a business does not have the ability to release microservices quickly and independently, many of the benefits of microservices will be lost

At QCon Plus last November, I presented some of the ways microservices can go wrong. I’m a consultant for IBM, and part of my job is helping businesses get cloud native. These problems are based on my experience – which, unfortunately, I see repeatedly in the field.

The first problem that I see is that we sometimes don't even know what the problem is. We feel we should be doing microservices, but we haven't really spent enough time defining why we are doing microservices.

What problem are we trying to solve? What's hurting us now? What's going to be better after we've done microservices? This is quite a natural human tendency, especially for us as techies. We want to jump to the solution. We want to play with the new shiny. Sadly, even though it’s really really important, figuring out what problem we are trying to solve is much less fun than solutioning.

Containers make this natural tendency to jump to solutions worse, because containers are a near-magical technology, which makes them a great solution. They're so lightweight. They're so portable. They make so many things so much better. We end up deciding “Because I've got these containers, it would be a terrible waste of the container capability to run my application in just one container. I should run it in as many containers as I can!” Unfortunately, “not having enough containers” is not a valid problem statement.

CV Driven Development

Another problem that I see is CV-driven development. We look at our CV, and there's a big blank spot where it should say “microservices”. That’s no good, so we say, “I can fix this by rearchitecting my company's stack”. You may be thinking, "No, that's just too cynical, Holly. Surely, no one would actually make architectural decisions based on their CV?" It turns out … they would.

Red Hat recently did a survey which looked at the main drivers for container-based development. Career progression was the number one driver. Career-progression is a nicer way of saying CV-driven development.

A microservice-shaped gap on a CV is a big deal, because at the moment microservices are almost a new orthodoxy. Even if we're not taking part in the great resignation, even if we're not looking for a new job at the moment, we don’t want to be the odd one out. When we look around, it seems like everyone else is doing microservices. The natural thought is, if they're doing microservices, what's wrong with me if I'm not doing microservices? I call this “microservices envy”.

Microservices Are Not the Goal

Microservices envy is a problem, because microservices aren’t the sort of thing we should be envying. One of our consultants has a heuristic that if a client keeps talking about Netflix and asking for microservices, he knows the engagement is in trouble. Almost certainly, they’re not moving to microservices for the right reason. If the conversation is a bit deeper, and covers things like coupling and cohesion, then he knows they’re in the right space.

The starting ambition for a microservices transformation should never be the microservices themselves. Microservices are the means to achieve a higher-level goal of business agility or resiliency or equivalent. Actually, microservices are not even the only means; they're a means.

Distributed Monolith

It’s important to ask "do you have microservices, or do you have a monolith spread over hundreds of Git repos?" That, unfortunately, is what we often see. This is a distributed monolith, and it’s a terrible thing. It's hard to reason about. It's more prone to errors than its monolithic equivalent. With a conventional monolith where it's all contained in a single development environment, you get benefits such as compile-time checking and IDE refactoring support. Because you're always executing in a single process, you get guaranteed function execution. You don't have to worry about remembering the distributed computing fallacies and service discovery and handling cases where the thing that you're trying to call has stopped existing. Things are safer. If, on the other hand, we take away the safety of the monolith but leave the coupling, we end up with cloud-native spaghetti.

Distributed is not Equivalent to Decoupled

A few years ago, I was called into a troubled project for a rescue mission. When I landed, one of the first things the team said to me was “every time we change one microservice, another one breaks”. If you've been paying any attention to, the promise of microservices, you know that this is the exact opposite of what is supposed to happen. Microservices are supposed to be independent of each other, decoupled. However, decoupling doesn't happen for free if you distribute your system. Just because ‘distributed’ and ‘decoupled’ both start with D, they're not the same thing.

It is quite possible to have a highly distributed system with all the pain that comes from being distributed while still being wholly entangled and coupled. This was what had happened in this case. When I started exploring the codebase, I kept seeing the same code over again in each repo. The object model for the application was pretty elaborate. There were about 20 classes, and some of those classes had 70 fields. That’s a complex schema.

One of the principles of microservices development is to let go of DRY and steer clear of common libraries, since they’re a source of coupling. In this case, to avoid the coupling of a central object library, each microservice had a cut-and-pasted copy of the object model in its code. But if the domain schema is still shared, there’s still coupling. Duplicating the object code doesn't eliminate the coupling, it just removes the possibility of compile-time checking. If a field name changes, it still breaks everybody, but the break doesn’t happen until runtime.

This sad story demonstrates the importance of domain-driven design principles in microservices. The ideal we’re going for is that each microservice maps neatly to a domain. A side-effect of this, and a sign you’re doing it right, is that the interfaces to your microservices are small. If we divide along technical boundaries, rather than domain boundaries, we end up with a situation like the one I saw; each microservice had a huge, brittle, interface. The result is a fragmented spaghetti mess.

The Mars Climate Orbiter

Although it’s technically a spacecraft, rather than a microservices platform, the Mars Climate Orbiter nicely shows the distinction between being distributed and being decoupled. NASA launched the Mars Climate Orbiter in 1998, with a mission to study the Martian climate. Sadly, the Orbiter did not succeed in orbiting Mars; instead, the probe crashed into Mars. NASA’s postmortem found that the problem stemmed from the relationship between two different control systems, built by different teams. Most of the time, the steering was done by a system on the Explorer itself. Every few days, when the Orbiter came into view of Earth, a supervisory control system in Florida would send out course corrections. This is about as distributed as a system can be; part of it was in space. But the domain is actually similar between these two systems: both were dealing with engine thrust calculations. The two teams hadn't been quite clear enough in their communication about what the interface looked like, and so they ended up using different units. The part in space used metrics units, the part on Earth used imperial units, so disaster occurred. We can safely say that in this case, the system was very distributed, and being distributed did not help.

Consumer-Driven Contract Testing

This kind of subtle communication issue happens all the time where multiple teams are involved. Happily, there is a good mitigation: consumer-driven contract testing. In systems where the IDE isn't helping us out with type checks, we need to test our integrations, but we want to keep full-blown integration tests to a minimum. Integration tests are heavy, expensive to run, brittle, and inherently coupley. If we’ve invested in developing microservices, we don’t want to go backwards and make a big integrated monolith at test-time. So how do we get confidence we’re building something that actually works?

Mocks are a common solution, but mocks on their own have a problem. To set up mocks, the producing team and the consuming team have a conversation at the beginning of development about what the interface looks like. They come to an agreement, and the consumers go away and try to write a mock that looks like their understanding of what the producing team said their code looked like. In the ideal case, they get it right. The problem is they bake their assumptions into the mock because they write the mock, and they're maybe not the best person to know what the other code looks like or how it behaves, because it's not their code.

In the happy case, they get it right. The unit tests all pass, and things continue to pass in the integration phase. Everything is good. Unfortunately, that's not always what happens. Sometimes, the actual implementation ends up being different from what the consuming team had understood, either because the producing team changed their minds or because someone somewhere made an assumption that was incorrect. In this case, the tests will still pass. However, when we actually integrate the real services, it's going to fail. The problem is that the behaviour of the mock is not validated against the real service. The producing team most likely never even see the mocks that have been created.

A better option is to have a consumer-driven contract test. The beauty of a contract test, and why it's different from a mock, is both sides interact with the contract test. For the consumer, the contract test acts as a handy mock.

On the other side, the contract test serves as a convenient functional test for the provider. It's a more profound validation than just something like a OpenAPI check for the syntax. A contract test actually checks the semantics and behaviour as well. This saves the providing team time effort writing functional tests.

If everything is compatible and functional, all the contract tests pass. This is a quick confidence boost, because they’re cheap and light to run. If the providing team breaks something, their tests will fail and provide an early alert, before the breaking change escapes to the integration environment. If the API changes, a new version of the contract is rolled out on both sides (or to a connecting broker).

There are a few different contract testing systems out there. Spring Contract works really well if you're in the Spring ecosystem. If you're a bit more polyglot, then I really like Pact. It's got bindings for almost every language that you might be using..

The Enterprise Hairball

Of course, even if we sort out all our testing, and even if we have a beautiful set of decoupled microservices at the business logic layer, success is not guaranteed. There will be many other elements in our system that we maybe hadn't considered when we drew up our really clean microservices architecture. We get really excited about the business logic, and we forget the front and the back, and all the glue. The glue is particularly likely, and sticky, in enterprise architectures. One of our architects calls this the enterprise hairball.

If we focus all our functional decomposition efforts on the business layer, we often end up with a bunch of neatly decoupled microservices, sandwiched between a monolithic front-end and a monolithic database layer. Change will be challenging in those kinds of systems. However, as an industry, we’re getting better at decomposing databases so that they map to individual microservices, and we’re developing micro-front-ends.

But we’re not done with the decomposition. If the system is non-trivial, we will have an integration layer. This might be messaging, or it might be some other integration solution which pulls the complex system together. Even after the rest of the architecture modernizes, the integration layer is often still monolithic and inflexible. The team itself may be under significant load – a “panicked sandwich” as my colleague put it. Because the integration layer is monolithic, they have to carefully schedule all changes, which blocks everybody else.

This can cause a lot of frustration, particularly for the integration team. To the outside, they can seem unresponsive and slow, even though they're working hard. To sort out the coupling, we need to adopt modular integration patterns.

What happens if we don’t slice up the integration, database, and front-end layers? Almost certainly, our microservices won’t achieve what we want. Dependencies across parts of the hairball will prevent any part from moving with speed. The business layer microservices will not be independently deployable and the pace of deployment will be decidedly non-continuous.

Drags That Hinder Releases

How many of you recognize this scenario? You work really hard, you've created something amazing. You know users will love it, but it’s not in their hands yet. Value is setting on the shelf, and your amazing thing can’t be released. Even though you have a microservices architecture, you also have a release board. All the other microservices need to be released simultaneously, because they need to be tested together and it’s too expensive to do that except in a big batch. Even filling in the release checklist is expensive. The business is scared of releasing, because it's been burned by shoddy releases in the past. The release checklist and the release board and the single-threaded testing and other release incantations are all attempts to reduce the perceived risk. Because the release deadlines are common across the organisation, we end up having to race to cram features in before the deadline. That, of course, makes the release more risky. Someone somewhere is tracking a spreadsheet with all the dependencies between the microservices which are more coupled than they should be. And, of course, the moon has to be in the proper phase. This wasn't what we signed up for when we chose microservices! All of these well-intentioned processes are drags which prevent value reaching users, and often actually increase risk.

Test Automation

How did this happen? Usually, the reason that we're so scared of releasing is because there's a ton of manual work involved in a release. In particular, the tests that actually give us confidence aren't automated, so we need to do a lot of work to figure out if our application even works. When I visit a client and hear “our tests aren't automated”, what I hear is”we have no idea if our code works at the moment. It might work. It worked last time we did manual QA; we hope it still works.” This is a sad situation.

If you care about it, automate it – and quality is something you should care about. Especially if the architecture has drifted towards spaghetti and coupling has crept in, breaks are likely. De-spaghettification is hard, so we want to be in a place of fast feedback, where we detect breaks as early as possible. If you're going to be spaghetti, at least be tested spaghetti.

The Release Cycle

Manual tests are only part of the manual process involved in a release. In regulated or compliance-focussed industries, there’s almost always a pile of manual compliance work. Compliance is something we care about a lot - so we should automate it.

With all these manual processes and all these slowdowns, what that really means is that even though we're deploying to the cloud, we're not getting the promise of the cloud. We're using the cloud as though it isn't a cloud. The irony is that in the cloud, things that we used to do, that used to be a good idea, that used to keep us safer, are actually hurting us. Old-style governance in the cloud doesn't work. It doesn't achieve the business outcomes that we were hoping for, and it loses a lot of the business benefits of the cloud.

It’s easy to spot whether a business is achieving the promise of the cloud by looking at release cycles. A few years ago, a colleague of mine had a sales call with a large legacy bank. Their lunch was getting eaten by fintechs and upstart challenger banks. The business could see why they were losing – they couldn't move quickly enough to keep up. They came to us and explained that they had a large COBOL estate, and that was what was slowing them down. (That was quite possibly true.) They then added that they clearly needed to get rid of all that COBOL and move to microservices because everybody else was doing microservices. Then they added that their release board only met twice a year. At this point, my colleague’s heart sank. If your release board only meets every six months, you know your release cadence will be every six months. It doesn't matter how many independently deployable microservices you have. You're not going to get the agility.

The help this bank needed wasn’t really technical help; they needed to change how they thought about risk, and how they did ops, and their release planning needed a complete overhaul, and they needed a whole bunch of automation. The lack of continuous delivery discipline is what was holding them back, not the COBOL.

“I want to be decomposed” is a common client request, but decomposed has more than one meaning. When we wish for a decomposed application, that doesn't guarantee modularity. Sometimes it just means that the mess is spread more widely. If there are external constraints, like release boards and antiquated process, that are olding us back, until we fix those, it doesn't matter how decomposed we are.

About the Author

Holly Cummins

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Seven Ways to Fail at Microservices

Write for InfoQ

Key Takeaways

Related Sponsored Content

CV Driven Development

Microservices Are Not the Goal

Distributed Monolith

Distributed is not Equivalent to Decoupled

The Mars Climate Orbiter

Consumer-Driven Contract Testing

The Enterprise Hairball

Drags That Hinder Releases

Test Automation

The Release Cycle

About the Author

Holly Cummins

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter