InfoQ Homepage Podcasts Chris Richardson on Design-Time Coupling in Microservices

Chris Richardson on Design-Time Coupling in Microservices

Jun 21, 2021

In this episode of the InfoQ Podcast, Thomas Betts speaks with Chris Richardson about minimizing design-time coupling in a microservice architecture. Chris begins by defining design-time coupling, and contrasts it with runtime coupling. We then discuss some of the problems that arise from design-time coupling, anti-patterns and symptoms that are warning signs of high coupling, and the trade-offs that architects need to consider in their designs.

Key Takeaways

Runtime coupling impacts availability. Design-time coupling impacts productivity.
Design-time coupling appears when multiple services, and multiple teams, have to make changes in lockstep, such as an update to an API requiring all clients to be simultaneously updated.
The structure of an organization, the processes it uses to develop software, and the software architecture patterns it follows are all heavily intertwined. Minimizing design-time coupling requires teams to be loosely coupled.
Completely eliminating design-time coupling is often impractical. The goal is to minimize it.
The concept of coupling, and using modularity to reduce it, is not new, and can be found in the writings of David Parnas in 1972.

Subscribe on:

Transcript

Introduction [00:05]

Thomas Betts: Hello, and thank you for joining us for another episode of the InfoQ podcast. I'm Thomas Betts, co-host of the podcast, lead editor for architecture and design at InfoQ, and a senior principal software engineer at Blackbaud. In today's episode, I'm talking with Chris Richardson about minimizing design time coupling in a microservice architecture. Chris gave a presentation on the same subject at Qcon Plus, and I really wanted to talk to him further and share his insights with our listeners. We'll start off with Chris defining design time coupling and contrasting it with runtime coupling. We'll then discuss some of the problems that arise from design time coupling, anti-patterns and symptoms that are warning signs when you have high coupling, and the trade-offs that architects need to consider in their designs. Chris, thank you for joining me, and welcome to the InfoQ podcast.

Chris Richardson: Oh, it's good to be here.

Design-time coupling vs. runtime coupling [00:49]

Thomas Betts: You spoke at QCon about design time coupling in microservices. Can you define design time coupling for us, and please feel free to go back and set the stage with why it's important, why it's something we should be thinking about.

Chris Richardson: Yes, and it's interesting because this whole notion of coupling, which is related to modularity and information hiding and encapsulation, that is a concept that's as old as the hills. I mean, I think it goes back to at least 1972, when Parnas wrote his paper on a set criteria for decomposing modules or something like that. So in a sense, what we're talking about is an ancient concept applied to, you could say a more modern software architecture, by the kind of notion that coupling applies at all these different levels of the stack. Between classes, between modules, and between services. So yeah, so sort of the fundamental idea is that in a microservice architecture, you've got a set of collaborating services. And collaboration implies, well, services need to communicate and any time there's collaboration, some coupling exists, and there's a couple of different types of coupling.

Chris Richardson: And actually in presentations I've talked about three types, and this presentation focused on one. But for instance, there's runtime coupling and that occurs when one service, say the order service, handles a POST request to create an order by executing a PUT request against the customer service, waiting for it to respond before sending back the response to the create order request. Seems simple, but one of the issues there is by definition, the order service cannot send back a response if the customer service is not working. So you've got this runtime coupling, which reduces availability. And in that example, you might say, well, what's the big deal? But in a microservice architecture, you can end up having long chains of calls.

Chris Richardson: I remember one time I was visiting a client and I met with a team that exposed an API that was integrating with a very large customer and they had a very tight SLA to meet. And the engineer said, "Oh yeah, we just called the fraud detection service." So that seems simple. And then I go talk to the fraud detection team, it turns out they've got seven services that implement fraud detection or communicating using REST. So to respond to this request from their external client, there was eight services involved. And you do the math and it's the product of the availability of all of those services. If you actually had 10 services, you'd lose one nine of availability. And that rose because no one really had a global view of the end to end flow. So that's runtime coupling that impacts availability of your system. So that's sort of the operational side of things.

Chris Richardson: And then another type of coupling is design time coupling. When we talk about services being loosely coupled in order to enable teams, sort of through the application of Conway's law to be loosely coupled and highly productive, what we're talking about there is design time coupling. So we want to minimize design time coupling. So a simple case is the order service is consuming the API of the customer service. So in a general case, that could be a REST API, or it could simply be subscribing to events. And design time coupling is really the degree to which a change to one service impacts its clients. So imagine the internals of the customer service changed, which required a breaking change to the customer services API. So then the order service has to be updated to reflect that change. And if those are owned by different teams, those two teams have to collaborate. They have to have a meeting to discuss those changes. And anytime you have collaboration between teams that takes orders of magnitude longer than collaboration within a team. As in collaboration within a team is just daily standup. Quick, fast decision-making.

Chris Richardson: Whereas in a lot of organizations I visited around the world, you have to find a meeting room and that's really hard. So if you have services that are tightly coupled from a design time perspective, that means changes sort of ripple through your system requiring the other services to change in lockstep, and that can dramatically impact the productivity of the teams.

Thomas Betts: Earlier on I heard you say you used to talk about three types of coupling. What's the third one?

Chris Richardson: Oh yes. Well, I call that infrastructure coupling, I don't know if there's another more widely accepted term for that. So if two services, say the order service and the customer service, are running on the same infrastructure or using the same infrastructure, there's the possibility of one service interfering with the other by simply consuming all available resources. So let's imagine the order service is the high priority, mission critical service, and the customer service is much less critical, if they're sharing the same database server, for example, the customer service could actually consume all of the resources of set database server and starve the order service, the mission critical one, of the resources that it needs. So it's sort of interference. And so another key design principle is you want to separate out, say mission critical functionality from less critical functionality, and have it running on its own dedicated infrastructure to eliminate this kind of coupling.

Thomas Betts: Got you. Well, we want to talk mostly today about design time coupling. And you quickly went past Conway's law and I feel like that's a key thing to talk about in this situation, because we're talking about the people, the organizations, how they interact. You have distributed systems, you have distributed teams, especially with people, all working remote these days. Where do you see the give and take between the structure of the organization, the process that's used to develop software, and then the architecture patterns that they use to develop this?

Chris Richardson: Yes. I mean, it's all heavily intertwined. So I mean, one way of restating Conway's law is, so the structure of the organization and the architecture of the software that it creates are actually mirrors of one another. And if the goal is to deliver software rapidly, frequently, and reliably, you need to have teams that are loosely coupled. Which actually means they can get their work done, deliver changes, without having to constantly coordinate with people on other teams. And the authors of the book Accelerate have discovered that that's a key characteristic of high-performing teams. So if you're finding yourself constantly in meetings, coordinating work, that's a symptom of having a tightly coupled organization. And given the global lack of meeting rooms, at least in the era prior to COVID, I get the sense that a lot of organizations have very tightly coupled. They're constantly coordinating and aligning as opposed to just writing code and delivering it into production. So you want to have a loosely coupled organization, and if you apply Conway's law style of thinking, that means you need a loosely coupled architecture, which specifically means loosely coupled from a design time perspective.

Thomas Betts: Yes, I know we weren't going to talk about infrastructure coupling, but the idea of a lack of meeting rooms, that resource allocation problem, how do you solve that? Well, that's not your actual problem, we don't need to build more meeting rooms, we need to have fewer meetings. We need to have more loosely coupled teams and that's what we should strive for.

Chris Richardson: I'll give you an example of that. I mean, this was a long time ago. Gosh, back in 2002, 2003, I was working on a device management server. Well, the company was building a product in the mobile space and it actually consisted of a client, some code that ran on the mobile device and a server that did the device management. And I was leading the device management team. And obviously the client needs to talk to a server, so we actually had a face-to-face meeting with the client team where we brainstormed the API that we would use, which was a very thin API. And then we both went off and developed our respective parts of the system. They were actually in the UK and we were based in Silicon Valley, so separated time zones and by space. But because we had a very tight API, very narrow API that was well specified, we could work independently without hardly ever having to communicate.

Chris Richardson: And as far as I can remember, when we actually integrated our two pieces of work together, it basically worked. So we weren't in that mode of working where we were constantly having to meet to align and negotiate. And so in a sense, in this world of distributed teams, the more distributed you are or the more remote a given team is, the more loosely coupled they should be. So once again, if you're finding yourself constantly in meetings with people who are 12 times zones away, that's a sign that you've got a tightly coupled organization and architecture.

Thomas Betts: I like the idea of the contract-first development, is one of the terms I've heard, that you agree on the contract and then you both agree to do your own implementations. You don't have to be aware of individual implementation details on either side of that. The client, the server, don't care, as long as that agreement is met on both sides.

Chris Richardson: Yes. I think when we developed that device management product, the term might not have existed, but in a sense it was contract-first, right?

Thomas Betts: Right. The language hadn't caught up yet. And it seems to be a common pattern. We do the right thing and then people will use a term to describe it and then eventually that term gets misused and misapplied for things that it weren't meant to be.

The iceberg pattern - expose as little as possible [11:32]

Thomas Betts: What are some other tips though? I mean, it seems like emphasizing that contract is important and I believe in your presentation you spoke about only consuming what you need and only surfacing what you need, like the iceberg pattern. And then the other phrase was for consuming as little as possible.

Chris Richardson: Yes. I don't really have a metaphor for the other pattern. Yes. So we talk about icebergs, right? So conventional wisdom around icebergs is that the part of the iceberg that's above the water line is small compared to what's below. And I think that's a great metaphor for services in particular, but also you could say software modules more generally, where you want to minimize what's exposed just for the simple reason that what's hidden can be changed easily and what's exposed cannot, because that potentially impacts the consumers of your software module or your service. And there was a book, what's his name, Ousterhout? I never actually had to say his name out loud, Philosophy of Software Designs, it's a very thin book, but excellent. He talks about modules being deep rather than wide. And that's sort of another way of sort of expressing this iceberg principle.

Chris Richardson: The example I was using in my talk is an API like, say Twilio. And I know that it actually got quite a rich API, but if you think about sending an SMS message with Twilio, that operation just has three parameters. The from phone number, the to phone number, and the message you want to send. So very minimal API, but presumably behind that is a heck of a lot of complexity because you can send the text message to people in 150 countries. And I think that's really complicated.

Thomas Betts: I would think so. I wouldn't want to write that. So you don't want 150 different APIs. You can see that being the other version, worst case scenario.

Chris Richardson: Yes. I mean, it's funny actually. I think going back to the original device management projects I worked on, I think we had to integrate with some SMS gateway and it was horrible and hacky. And actually that's come up a few times over the years. Anyway yeah, so presumably Twilio is hiding a massive amount of complexity behind a very narrow API, which is fantastic. Or Stripe. I mean, once again, the API is a bit more complicated than I'm saying, but the fundamental idea is like charge this credit card, and presumably behind that is a heck of a lot of complexity. And that's all been abstracted away and you've just got this very narrow interface, so [inaudible 00:14:21] designing services that way.

Thomas Betts: Yes. I think companies that have done that really well, Twilio and Stripe, where the API is their product to a large extent, they have to get it right, or their product wouldn't be successful. Do you see it as more of a challenge internally with companies because, oh, I can just call up Chris and we can talk about this any time. And so, because there's less of a barrier to having that communication, it can lead to tighter coupling?

Chris Richardson: Yes. And not only that it's like, say those companies, they're serving a very large number of customers and they can't have the individual conversations. Whereas you and I, we can agree on stuff, even though it might not really be a good idea. It just works for us at this point in time, but then perhaps something will change at a point then it's like, oh yeah, that API design was suboptimal.

Consume as little as possible [15:12]

Thomas Betts: And then the flip side of that, if the messages you're receiving have a lot of information, and I think you talked about [inaudible 00:15:19] the data you want, and you said you don't have a good metaphor, but talk about that a little bit more.

Chris Richardson: Yes. It was kind of upside down iceberg, right?

Thomas Betts: Right. Right.

Chris Richardson: Kind of the mirror of exposing as little as possible is consuming as little as possible. And you could say there's kind of two different aspects. Minimizing the number of other services upon which you depend, your outbound dependencies, because every one of those dependencies is a potential instability, or it could potentially cause you to change at some point. And so you want to minimize the number of those dependencies, and you could say in a way the ideal number is zero. And then say an ideal, which kind of reminds me of a thought, which I don't think I explicitly mentioned in the talk, but maybe you could say an ideal architecture is where you have an API gateway that just routes to a bunch of services and those services don't communicate. It doesn't work that way, but those services, because there's no dependencies between the services, they have no reason to change. Or at least because of other services changing.

Thomas Betts: So just to play devil's advocate, if the ideal number of dependencies zero, haven't you just described a monolith.

Chris Richardson: Yes. Essentially, yes.

Thomas Betts: That's not a good design decision, except for the times when a monolith does make sense. But one of the arguments for a monolith is you have more freedom to change things that need to change frequently and fewer dependencies.

Chris Richardson: Yes. I mean, what's interesting is a monolith is comprised of modules, and then the issues of loose coupling are actually very much the same there internally, they're just not externalized. But yeah. And then the other aspect is, so you want to minimize the number of dependencies you have, and then you want to minimize how much you consume from each one of those dependencies as well. So overall, before it was the surface area of the API, you want to kind of minimize the surface area of what you consume as well.

The goal: minimize, don't eliminate coupling [17:26]

Thomas Betts: If there are all these problems with design time coupling, do you think it is reasonable to try and eliminate it, to try and get to that zero? Or is that an unreasonable goal? And if it isn't reasonable, then what are the trade-offs you consider along the way and say, okay, this is what we're willing to accept because going further would bring all this extra burden?

Chris Richardson: I think realistically, services end up having to share concepts. That's just sort of the way it is. But if you view designing and microservice architecture as one of sort of allocating responsibilities and another way of putting it is... I had this other talk, which was dark matter, dark energy, that talked about sort of repulsive forces and attractive forces that either encourage decomposition into services or resist decomposition into services. If you view it as a problem of grouping subdomains, service subdomains, as some chunk of functionality corresponding to a functional area of your business, the challenge of coming out with a microservice architecture is defining a service as a group of subdomains, and then so how do you solve this grouping problem? And basically if you have an operation that spans multiple subdomains, it kind of encourages you to put them together inside the same service, because if an operation spans both of them, then there's effectively collaborational coupling inside your system.

Chris Richardson: You want to put them together, but then there's the repulsive force of wanting to have sort of these team size services that can be deployed and developed and tested and deployed independently. So it's all sort of a matter of balancing out these opposing forces and kind of designing the patterns of collaboration in a way that minimizes the amount of coupling. And you could say avoiding anti-patterns like, one common anti-pattern is where you have a data service. So you've got your order management service that has the business logic, and you've got an order data service that just wraps the database, you kind of see variations of that, right? And there you've sort of split the responsibilities for managing orders across two services, which inherently implies collaboration. And so there's all this data that's being exposed through the order data service API to support the order service, but then it's accessible to other people. So it's kind of like this naked access service in a way.

Chris Richardson: Whereas if you put the two together, you eliminate the need for that public API for the order management logic to use and then the resulting service can just expose the actual order management functionality.

Thomas Betts: Got you. So if I can restate that, to see if I understand it correctly, one of the things I see about microservices is databases don't get shared between services. And when you describe the order data service, it sounds like, well, you're just exposing the database so two services can call it, which seems like, well, why not just have them both call the database? And you're talking about putting order management or order processing that is name of the service. You don't have two separate services that can't exist independently. That's your service boundary. That's your bounded context.

Chris Richardson: Yes. Yes.

Patterns and metrics to look for as indicators of design-time coupling [20:54]

Thomas Betts: That's one anti-pattern and we talked about meeting room availability being another one as an indicator that you're having too much design time coupling. Are there any other things people should watch out for that are symptoms that then leads them to, here's our actual problem?

Chris Richardson: Well, I think an interesting sort of metric is, actually I suppose it ultimately boils down to, in order to implement this feature X on your backlog, that requires lockstep changes across multiple services. And so, if you're constantly seeing a pair of services changing in lockstep, it's indicative of a design time coupling problem. I mean, you could argue sometimes that's okay, but if it's constant, very high percentage of the time these two services are changing together for the same reason, they probably should be the same service. And interestingly, the other way of putting that is the object-oriented design, the ancient principle of the common closure principle, that change for the same reason should be packaged together.

Thomas Betts: Right. Going back to the Conway's law discussion, that sometimes means one team might have to give up the service that they've built and owned because the better design decision is there's a new combined service and another team. So this isn't just a technology problem, sometimes it will take an organizational solution to say the better solution is to rearchitect our teams to have a better architecture.

Chris Richardson: Yes. People refer to that like reverse Conway maneuver, where you have your architecture that's loosely coupled, and then that works backwards and impacts the organizational structure as well.

Thomas Betts: What are the other behaviors or tips, I guess, that you can tell people to look for? It sounds like DevOps, you mentioned Accelerate being a good book to read, is there any other resources people can go to for how to set up their teams, how to set up their architecture?

Chris Richardson: There's like sort of these classic, well I wouldn't say classic, books that I like, but to me the Accelerate book, it's just an amazing resource. It kind of explains why we're doing what we're doing and it's got all the data, sort of the actual research to back it up, namely that high performing software organization correlates with basically business success. And then on a high performing team that basically needs to be a loosely coupled team, which then sort of translates back to a loosely coupled architecture. So yeah, it sort of draws a line from loosely coupled architecture to more money.

Thomas Betts: And you can't easily have one without the other, or one leads to the other.

Chris Richardson: Yes. Having said that, there are certainly software products out there that are highly profitable, but internally they're steaming piles of poo.

Thomas Betts: It's the duck swimming calmly on the water and underneath it's paddling its feet madly.

Chris Richardson: Yes. I mean, that does exist, but you could say that's perhaps due to the market dominance of the software vendor and not due to the internal agility of the organization. Whereas if you're a newer company, where you don't have the muscle of an established company, I think you have to do well. Plus, I honestly believe that sooner or later, all of that tech debt is going to come back and bite you. Ultimately you will pay the price. So yeah, the Accelerate book is really good. I think related to that is the DevOps Handbook. Because to me, I feel like that's the definition of what DevOps is all about and all the principles and practices that make up DevOps. Yes. And also Jez Humble, Dave Farley's Continuous Integration book. I think that's a fantastic book. So those are all sort of process. And then in terms of organization side of things, the Team Topologies book is really the go-to resource there.

Thomas Betts: What does Team Topologies talk about with this?

Chris Richardson: It talks about say, well, it defines what a team is and kind of why you should have these small teams and sort of some of the underlying psychology behind that. And then it talks about the different types of teams. You can have a stream-aligned team that's working on features or business functionality. And then there's a platform team, for example, that's creating the platform that those business teams are actually building upon, which could be, say self service deployment and continuous deployment infrastructure and so on and so forth, and there's some other team types. And sort of they collaborate and the different ways with which teams can collaborate, which is little collaboration working side by side, which is useful sometime. But then a lot of the time they're sort of like X as a service, we're providing this as a service, so much more hands-off kind of pattern of collaboration. Yes, it really is a must read book.

Orchestration vs. choreography when implementing the saga pattern [26:02]

Thomas Betts: We're almost out of time. I did have one last question I wanted to get to. And unfortunately it's something that could probably be an entire episode on its own, but I wanted to talk briefly about orchestration and choreography because I think those are two different ways of handling design time coupling. And can you give an example of each pattern and why it matters?

Chris Richardson: Yes. I use those terms in the context of the saga pattern. And it's like, well, what is the saga and why? So we go back to my simple example where in order to create an order, you have to reserve credit in the customer service. In other words create order actually spans multiple services. And so you should avoid using distributed transactions because they actually introduce runtime coupling. So you want to use a saga, which is a series of local ACID transactions in each one of the participating services, and a good way to coordinate that is through asynchronous messaging. And so there're different ways to implement it, but one example is the request comes into the order service, it creates an order, publishes an order created event, that gets consumed by the customer service, which reserves credit, publishes a credit reserved event, which is then received by the order service, which then changes the state of the order to approved.

Chris Richardson: That's often thought of as an event-driven architecture. But what I would call that is a choreography-based saga. So services announce what they have done, which triggers actions in other services. And that's nice from a runtime coupling point of view because they're loosely coupled. There's no synchronous calls. No one's waiting for some other service to respond in any timely way. So it doesn't matter if the customer service is down, eventually it will come back up and any outstanding events will be processed and the credit will be reserved or rejected. So that's choreography-based sagas.

Chris Richardson: And then there's orchestration-based sagas, which is where you have an orchestrator that is telling the participants what to do. And that once again can use messaging, but instead of events, which are a way of communicating what has happened, the orchestrator will send a command message to say the order service, telling it to create an order. It will then eventually send back a reply message saying that it's done that at which point the orchestrator will then tell the customer service to reserve credit, which it will do, and then send back a reply message. And then the orchestrator could tell the order service to approve the order. So it's all based on asynchronous messaging, but the style of the messages that are being exchanged are quite different. And you're right, this is actually a complex topic.

Chris Richardson: The interesting thing with both choreography and orchestration is on the one hand, they're both loosely coupled from a runtime perspective, but from a design time perspective, there is still coupling, is just the nature of the direction of the coupling and the nature of the design time coupling is slightly different. So if you think about orchestration, the orchestrator is invoking the APIs of the participant, so it depends upon them. Whereas with how I described choreography, the participants are listening to events, the other participants are emitting, and so there is actually design time coupling between the participants. And whether or not that's a problem or how well it works, I think does depend on the specific scenario.

Thomas Betts: I have plenty more questions, unfortunately, we're out of time. Chris, what else is coming up for you? And what's the best way for people to reach out?

Chris Richardson: What am I working on? I just do Microservices. I think I've been talking about concepts like loose coupling, and God knows what, on similar related things for 15 odd years actually. I mean, some of the same themes apply back in the POJOs in Action days, they really do. So yeah, people can find me at, well, obviously on Twitter, @crichardson, also at microservices.io. And then my consulting and training website, chrisrichardson.net. So there's a few different places you can find me.

Thomas Betts: Great. Well, thanks again for joining me today, Chris, and thank you to all of our listeners for joining us on another episode of the InfoQ podcast.

Mentions

About the Guest

Chris Richardson is the creator of microservices.io; Author of Microservices patterns & Java ChampionChris Richardson is a developer and architect. He is a Java Champion, a JavaOne rock star and the author of POJOs in Action, which describes how to build enterprise Java applications with frameworks such as Spring and Hibernate. Chris was also the founder of the original CloudFoundry.com, an early Java PaaS for Amazon EC2. Today, he is a recognized thought leader in microservices and speaks regularly at international conferences. Chris is the creator of Microservices.io, a pattern language for microservices, and is the author of the book Microservices Patterns. He provides microservices consulting and training to organizations that are adopting the microservice architecture and is working on his third startup Eventuate, an application platform for developing transactional microservices.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.