Randy Shoup on Microservices, the Reality of Conway's Law, and Evolutionary Architecture
Recorded at:

| Interview with Randy Shoup Follow 23 Followers by Daniel Bryant Follow 806 Followers on Jul 03, 2015 |

Bio Randy Shoup is Consulting CTO (former Google and eBay). Randy has worked as a senior technology leader and executive in Silicon Valley at companies ranging from small startups, to mid-sized places, to eBay and Google. In his consulting practice, he applies this experience to scaling the technology infrastructures and engineering organizations of his client companies.

Sponsored Content

CRAFT is about software craftsmanship, which tools, methods, practices should be part of the toolbox of a modern developer and company, and it is a compass on new technologies, trends. You can learn from the best speakers and practitioners in our community.


1. We are here in CraftCon Budapest, Hungary and I am with Randy Shoup. You just had a great talk today, Randy, about moving from a monolith to microservices, based on your experience at Google and eBay. For the viewers at home, could you give us a brief high-level picture of what you have been talking about today?

Sure. Most of the places that big web sites that we know about – Amazon, E-bay, Google, Twitter etc – started out as a monolith and have ultimately all gone through a process of convergent evolutions, not coordinated in any way and have ended up somewhere that we are starting to call polyglot microservices. So, I wanted to explore why that was the case. The first was talking a bit about a monolith and why you might want to have one. I mean we say “monolith” and we mean it as a slur often, but the reality is that most systems, or certainly most stages of companies, that is a perfectly appropriate architectural approach. You do not need to distribute if you do not need to. So, I talked a little bit about the pros and cons of monoliths and then flipping over to what it looks like in microservices and again, pros and cons of this, simple individual pieces but now you have to deal with a coordination among them. We talked a little bit about what is it like to be an owner of a service in a large-scale microservice ecosystem, like in Amazon, or at Google or in Netflix. Then we closed with some anti-patterns, which I have personally committed, in every sense of the word “committed”, and I talked about why those are inappropriate and why you could do something better.


2. You mentioned that a lot of your experience comes from large organizations such as Google and eBay. Do you think these lessons learned are directly applicable to smaller organizations and is there, perhaps, a cutoff point as the size of the organization ?

That is an excellent question. Yes, I think they are applicable, but the better answer is that there is a spectrum and when is it appropriate to apply one or the other. OK. It was not something that I addressed in this talk specifically, but actually on my slideshare I talked briefly about phases of start-ups in particular and when it is appropriate to take one step or another step. I can replay it quickly if you are interested. So, in the early phase of a start-up we do not even have a business model, we don’t have product market fit, we do not have a product.

So, it is inappropriate, I think, to think about any architecture or even any technology. Like, if a WordPress blog or buying ads on Google is the right way for you to test your hypothesis about how to move forward, you should totally do that and not build anything. Then there is a phase where we have a product market fit and we think people are willing to pay for it and now we are trying to grow that business and, typically, that is a slower than we would like ramp. Again, that is a situation where we went from minimal. It is not about the technology and it is certainly not about scaling that technology or the organization.

We typically have a group of people that can fit around the conference table. This is not the point to split the architecture up into small services, divide into small teams, etc. That comes later! Right now we are one team and we are building one thing, the simplest thing that could possibly work. Then one hopes that you will start to hit the limits of the monolithic organization and the monolithic architecture and that is what I call the scaling phase where you hit the certain inflection point. The point seems to be in company size and organization size, between that 20-25 person mark and the 100 person mark.

I don’t know personally why this is true, but I have observed that there are sort of stable points for organization size: everyone fits around a conference table – up to 20-25, the next point seems to be around 100. It is in that transition point where you can make a single team work, even at 20-25. You are rickety, but you can still behave as you have a single team with fluid roles and so on at the 20-25 mark. But as soon as you are beyond that, and certainly if you scale up to a 100, you need to flip the organization and the technology to subdivide into teams with well defined responsibilities, and that is a good point to switch from a monolithic approach to what I would term microservices.


3. That sounds really interesting actually, Randy. If the organization has the size to make that move, as you said, it has grown to the point in which they say “Now is the right time”, what do you think are the first steps either at a technical level or at an organizational level that they should take?

I am glad you asked it both with technical and organizational aspects in mind. Conway’s law teaches us that the organizational structure is reflected in architecture. So, it is maybe a bit counter-intuitive, but when you are in the point where the monolith is slowing you down – not earlier – the first step you should make, or at least coextensive with dealing with the technology, is change the organization to the structure you want to have. So, subdivide into 3 - 5 person teams typically – “two-pizza” in the Amazon metaphor – and have them be responsible for individual parts.

That naturally lends itself to the technology being split up. Ok, there are lots of monoliths that are very highly coupled, in fact most of them, and so it is not a trivial exercise to break them up. So, as a practical matter, here is what I recommend to people and people that I consult with now – first, we think it is a good idea to move to the new model and so first, we have to agree to that. Step zero is to take a real customer problem, something with real customer benefit, maybe a reimplementation of something you have or ideally some new piece of functionality that was hard to do in the monolith.

Do that in the new way, first, and what you are trying to do there is to learn from mistakes you inevitably will make going through that first transition, right? You do it in a relatively safe way, but at the same time you do with real benefit at the end. At the end, if nothing else you have produced real tangible business value out of this work - that is step zero and what we have done now is that we have gotten comfortable with the idea that we can do, it hopefully achieves the goal that we were expecting of velocity and isolation and so on, and we have learned a lot. Now, we should go through the most important, with the highest ROI, vertical slice, with real customer benefit next, and then keep going until you run out of patience.

So, that is how eBay did it. So, when eBay went from version two, which was a monolithic C++ DLL into a more partitioned, individual Java application. When it went through that transition which overall took many years, it first did that step zero of more than a pilot, way more than a prototype, but something that was tangibly highly valuable was produced, and then eBay reverse-sorted the pages of the site by revenue and they did the highest revenue ones first (and they did a bunch of things in parallel), which seems a bit odd and a bit risky, except we you have already de-risked it by doing that step zero. So now you are saying “I only have limited resources to apply against this migration path over time and at some point I am going to run out of ROI, I am going to run out of organizational resources that I am interested in investing in this”.

That is OK, because you have done the biggest ones first and then this certainly was true in 2010 or 2011 when I was last there and it might still be true - there were still pages on this site that were on V2 architecture, simply because they continue to work, they got 100,000 hits a day, no big deal, and they were neither painful enough to migrate nor having the sufficient ROI to migrate. So, they just stayed and they happily stayed.


4. One thing that I found very interesting is in the talk you mentioned that, say, with Google – the architecture was evolutionary and not necessarily by design. Now Amazon, Google and the likes are known having the brightest and the best hand. Do you think there are more guidelines perhaps required for small organizations?

Well, it is always nice to have the best and the brightest, but I think there are lots of good and bright all around. There are many more smart people that do not work at Google and Amazon than those who do work at Google or Amazon. So, I don’t think that is a real – I don’t worry too much about that. But, are there guidelines for smaller organizations? Absolutely. And again: the meta-point with all these things is “only solve problems that you actually have”.

So, it is great to be able to go to conferences. I think it is great to talk about these thing, maybe people have some value in listening to me talk about them, but I am increasingly trying to be very clear about when I describe what works well for eBay or Google, I am trying to describe why that is true, and everything is a trade-off, right? Google and Amazon are very intentionally trying to optimize for velocity of large-scale organizations, which means lots of things moving in parallel with little coordination. They behave like 1,000 tiny companies rather than one monster company and that is why those companies versus a bunch of even larger organization, move fast and the other one do not. But in the case of smaller organizations, if you are all one team, do not subdivide. You should continue to be one team. If you do, as you subdivide into two or three teams, be pragmatic and step by step subdivide the architecture not into 1,000 different things, but in two things, three things, ten things, and so on.

I think it is important to know that you are going to evolve. Again, every successful company has evolved, or I’ll say it another way: no successful company that I am aware of that we have ever heard of has the same architecture today that they had when they started. Don’t get too bitter and angry with yourself that the first thing you try is not the thing that lasts forever. In fact, if you had done the thing that was going to live for five or ten years when you started out, we would have probably never heard of you because you would have spent all your time building for some far future that never came, rather than building things that met near-term customer needs in near term.


5. One thing I picked up from your talk as well was the need to standardize in between microservices, the connectivity, if you’d like. Have you got any guidelines of how to lead or manage those standardization efforts?

Sure. I just want to repeat a bit that part because the key thing is so often large enterprises, many of whom I have worked for, have this visceral idea that we should never duplicate effort and we should standardize on technologies and operating procedures and so on. One of the things that is maybe interesting to know about the Netflixs and the Amazons and the Googles of the world is that they tend not to standardize on the internals of services, right?

So, the service has a well-defined interface boundary that is isolated and encapsulated and modular and within that, as long as the implementation is respecting the interface that they export and have agreed to, it really does not matter what is inside. Is it Haskell?, is it Ruby?, is it Basic? - it actually should not matter as long as it meets the outside needs, and that is what encapsulation and isolation actually mean. So, those big ecosystems do not standardize the internals of services. They are common conventions and it is not like people are inventing new ways to do it all the time. But, what you do need to standardize is the communication.

It is a big network and you need to standardize in some sense the arcs in the network, not the internals of the nodes. That should match our intuition about, for example, how economies work or how human interactions work. So, when we are having a conversation, we have to agree on a common language, like if I speak English and you speak Hungarian back – I do not personally speak Hungarian, unfortunately – that would not work. It is the same with economic interactions.

So, if you are a shop owner and I want to buy some fruit from you, we have this agreement: I am going to pay you in some currency and that has some meaning. But, by the same token – you did not really ask this – that does not mean that we have to have a global language or a global currency because in our reality we do not have either global language or global currency. We just need to agree on conventions for particular interactions. So, how do you deal with that? Well, I’ll describe what happens at the Amazons and the Googles in that they often start with one initial thing, because they are small at the time, there is kind of one standard and everybody communicates with that standard, and if that is perfect, then it can always be that way.

But over time they are going to learn that “Oh, I can make this faster and add more flow control”. I mean there is a bunch of things that you can add to a network protocol that solve problems that you have at scale. What happens in reality is that there becomes version II of the protocol, version III of the protocol and so on, and over time, those things get kind of adopted by more and more services, as those services need the capabilities of the new version or as the consumers of those services demand, in some sense, the capabilities that are in that protocol. So that is how it happens: evolutionarily more than by dictate.


6. Thank you so much for the answer. That is great. So, the final concluding question is – What do you think is coming after microservices?

Yes, excellent. So, maybe I am insufficiently imaginative, but microservices as a word is new, but the concept is old. It is SOA done properly. Are there any other ways of organizing software? Of course there are. But there is a reason why the Amazons and the Googles and the Netflixs and the Gilts and the Yelps – you know, everybody is kind of ultimately rediscovered through convergent evolution, this same general concept. So, I think microservices is a real thing. Maybe the word will die, but I think what will happen if we have this conversation in 3 or 4 years is that there will no longer be microservices in anybody’s talk titles. We will not be talking about microservices, because it is just going to be a thing that we do.

The analogy that I think of here is NoSQL. So, if we were having this conversation 3 or 4 years ago, when the hot topic was not Docker and microservices, neither of them existed, neither of the two words existed, but the hot topic was NoSQL systems and now, it is not that NoSQL systems go away, it is not that they are not important anymore, it is that now, the fact that Netflix uses Cassandra is not the subject of a talk, it is a line item: “Oh, we use Cassandra”. And that is sufficiently descriptive of that thing that we do not say much more about it. Anyway, I think that the next thing about microservices is that we will stop talking about microservices, but we will continue doing them.

Daniel: Thanks a lot, Randy. Thank you for your time.

Thank you.

Daniel: I appreciate you talking to InfoQ.

No worries.