Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Understanding Complex Software Systems by Embracing Chaos

Understanding Complex Software Systems by Embracing Chaos


1. One of the things that you talked about in your session, which I attended, and I enjoyed very much, was embracing chaos. Just to get started, why don’t you explain for those who did not see your talk what you meant by embracing chaos?

Embracing chaos - this came directly out of my experience at Uber where you would have to describe the pace of growth as chaotic. We are adding people so quickly, and we are engaging in fierce competition around the world and this is a very, very chaotic way to do software engineering. It's not the kind of thing that you can plan out and make orderly decisions as much as these things are happening and we have to just make the best we can out of the situation.

It's not like the whole thing is on fire constantly. But it is not as efficient when we are hiring engineers very quickly. And these engineers are put to work right away and we want them to help us fix bugs and add features, but adding people that quickly creates a certain amount of inefficiency and that combined with the rapid growth of the business produces what I think you would have to call chaos. Your natural inclination as a software engineer is you want to use the software to make the chaos go away, to bring some order to the chaos and I don't know that that’s actually possible. Maybe it is. Maybe you can actually completely dominate this chaos with a nice orderly scientific equation or whatever.

But that's not been my experience and it is especially not true in our current environment and I feel the best we can do is attempt to just sort of live with that as the state of at least our business and I think increasingly our industry. I mean I think that Internet scale and breaking stuff, microservices and all these new languages and things are happening very quickly and I don't know that it's possible to get this sort of provably correct theoretical computer science kind of outcomes. I think the real world just is chaotic and I think the kind of systems that I think we should be building are the ones where we acknowledge chaos as a constant and we try to make the best outcomes we can from that.

Michael: Certainly, in areas like eventual consistency, we have explicitly embraced that chaos, but the questions that you raised, or at least implicit in what you're saying, are two issues: one which we sort of addressed is the people issue. Because very often in software it always comes down to people - you think it's a technical problem but it's really a people problem. And that people are not comfortable with chaos so from that point of view what you're saying is that we need an attitude adjustment so to speak and just be happy because people tend to want to go home at the end of the day and say “5 o’clock, I've solved the problem and it's gone” and you are essentially saying we have to stop living in that world.

Does embracing the chaos mean that you have to work all the time and you are forever fighting fires? I don't think so! That’s actually not what I mean. The distinction that I would like to make is that I don't think that you can beat it I think it is just something that you had to build systems to make living with it more agreeable. So I mean yes people are uncomfortable with it like it sounds like not what you expect - the expected behavior. You think that you write programs and they work. You'd be able to get to the end of it. I think your natural expectation as an engineer is you will write some software and you'll deploy it and you win. Now you can go work on some other thing. But the modern reality is it’s much more complicated than that. So, I don't think it means that software doesn't work and these services don't work - obviously things work.

These big services like Uber and Airbnb and Google, these services obviously work but behind the scenes they are incredibly chaotic and I think that increasingly, our role as software engineers is to build in systems that allow for these chaotic behaviors to not leak out and become actual brokenness. But realistically if you step way back and you think “What are we actually doing as software engineers?” We are trying to make computers improve our lives, improve the quality of our lives as human beings and that does not mean being on call all the time, it does not mean being at the whim of just some disorganized chaotic system. It means helping people live better lives and we got to build systems. Having a real life I guess is the other thing.

Real life is chaotic you do not control it things are happening all over the place the world is a big old crazy mess but there is no way we will ever order the world. And yet, as humans, part of maturing is you figure out how to embrace that and say “The world is crazy!” and actually sometimes you find some beauty in the craziness of the world. But, as you form a kind of mental model of actually living with that, it is OK. So I think this is what we are kind of trying to do with the software as well.


2. That then raises the question if we are trying to model chaos, do we actually understand what is actually happening in the architecture? I don't mean we are blindly writing code or blindly architecting stuff, but we set some principles down maybe we expect some metastable solution. But do we really understand at any point in time what the system is actually doing or do we just hope that we bounded the chaos somehow?

Can you ever understand these crazy systems that we are building? I guess it depends on what you mean by that and it depends on what you want to do with that answer. I think honestly a big reason that these things are so chaotic is because we don't understand them. If we had a deeper understanding maybe we could simplify some stuff, but that understanding comes with a cost. So it will take you some time to better understand the thing, and it will maybe take some more work to add some instrumentation, and maybe you’ll have to buy more computers to track more metrics.

Understanding the way that the system works is not free, and it's easy to overlook that because you might think the responsible thing would be instrument everything and everything is 100% knowable. But it’s sort of not practical because you would end up needing way more computers to understand your actual thing, then the primary reason that you wrote this software for in the first place.

Michael: What I remember from graduate school is that information has a cost, and that perfect information is very often not worth paying the cost to get.

That is exactly the trade-off that I think we are faced with. There is a cost to having this information, so I think an interesting aspect of that is I feel that, culturally, our expectations have shifted in a way that may or may not be useful. But I think whether or not we’ve explicitly acknowledged that information has a cost I think we've kind of culturally come to expect maybe that you just won't know some things or that you shouldn't know you shouldn't even try to know and this is a tricky balance. I find especially with folks that are recently out of school with less years of fighting production battles late at night with outages or whatever, I feel the value of truly understanding the way that your systems should work, that expectation that you would be able to get this understanding is really not there.

People I find increasingly just don't even ask for it. Things are just uninstrumented or worse, uninstrumentable. And this is actually considered ok by a great many people and I think that's where it all kind of flips if these things that they're working on suddenly they get a lot of users, they take off and they become larger concerns, then I think the fact that you need to understand at some level these things in order to solve the problem, becomes more obvious. But it is increasingly harder to get people to expect that, to be able to expect that you could understand things. And to be fair in many cases it actually doesn’t matter. In many cases if you have systems that are simple they have very few users it’s probably not worth your time to try to understand how they work because who cares.

Michael: There have always been anomalies and as long as they don't become unbounded anomalies you are fine.

Yes, but I think it's also fair to say that we have a lot of systems; our industry has a lot of systems that work for reasons that no one understands.

Michael: They are surprised they even work at all.

Because there are so many pieces, there are so many parts that are actually just not knowable by any one person. You can do your best, and try and get the bigger picture and understand those systems as a whole, but they are too big for one person to truly understand and that's just how things are. That's why we have abstraction.

Michael: Yes, and abstractions leak occasionally.

They certainly do.

Michael: Let’s take a step back from that, maybe you could give us an example that is more relevant to Uber. Right now as a result of terrorist acts. The governments are again in the United States and other places demanding back doors, weakening computer security, but I don’t think that the legislators fully understand what that would do to ecommerce. For example, if there was a way to figure out even though you had an encrypted connection, what information was going across and that got exploited, Uber would not be in business, Amazon, etc. You could see the Internet economy collapse. So there has to be some way that you can educate the legislators who seem to be sometimes maybe 10-20 years behind this rapid change in technology.

I think you said it exactly right. The lawmaking process is a slow one on purpose. That’s intentional baked into the Constitution - at least in the United States - to actually slow it down to make sure that if we want to change something where everybody is really sure that this is a good idea to be changed, and that becomes tricky with things like laws about technology, because technology moves very quickly. In fact, the point of technology is exactly the opposite it should just move even faster to eliminate inefficiencies. That’s rewarded, that’s what we pay more money for - things that save us time, that are more efficient. So that is going to be an increasing challenge. This is beyond the part of the business that I am directly involved in, but certainly I can see this would be very hard if you are a lawmaker.

Michael: But it does impact your part of the business (and this is something that I have contended for a long time), because you are the part of the business that understands how to respond to these issues and perhaps proactively think about systems that are antifragile against regulatory change.

That is a fair point. I think that if you look at the actual architecture that we are building out, it is intentionally aware that we might have new rules that we have to deal with, especially in different markets. And this specifically for example why we run our own datacenters and one thing that that lets us do is if there is a regulatory environment that is harder to meet with a cloud provider for just various reasons …

Michael: Data has to stay in the country where it’s created.

Right. Then what if that country is some country that’s not very big and there is no obvious way to get your cloud provider to keep the data there - this is a real concern. That’s an example. I feel that our general approach has been to just try to be as nimble and flexible as possible to adapt to whatever else comes along, because who knows what future laws will exist that we will have to interoperate with. We are just trying to keep as many options open as possible. I mean that’s all we can do at this point.


3. Is there anything else that comes to you mind about this topic that I haven’t asked you or thoughts that went through your mind that didn’t come directly to questions that I asked?

I think that covers it in general. One higher-level observation is something that I would really like to see more of in the industry is way better performance tooling, but specifically education or just understanding of how to interpret it. I feel like there are actually some reasonably good tools and approaches for gathering data and visualizing some data, but most people are using very primitive tools and very primitive approaches. From my own personal life, I downplayed the importance of statistics when I was in school, because I was like “Statistics? Who needs that? I’m here to write software, I like compilers and operating systems, I don’t need statistics.” Now that I’m actually responsible for building and maintaining some of these large systems, it turns out statistics is as important as anything else, even as writing the actual software, because it’s the only way to understand what the software is actually doing.

Michael: Or understand the world. The software reflects the world, the world is statistical so the software has to be.

I wish that I had spent more time earlier on in my career on statistics and I wish that that was more widely understood - people should care about statistics. It helps you make software work.


4. Now, what I would like to do is switch the tenor of the interview. There are some questions that I like to ask architects when I talk to them. So you might think of it as the architectural studio version of the Actors Studio. The questions are sort of adopted from them. What’s your favorite part of being an architect?

Well, it’s a funny question, because this role of architect, if you like, I feel the industry’s tide is turning on that term in general and it’s getting to be at the point where “architect” sounds a lot like thought leader. It’s one of these made up terms. It’s like something that you would make fun of somebody for saying that their title is. And I think there is an interesting reason behind that - I think there is still a very important role, a very valuable role for someone to understand how things fit together and that is what I really like to do. I like to understand how things fit together from a perspective of someone that has been in the trenches and has written tons of software and battled many bugs, but being able to connect the dots and help different teams work together to build better software is what I really like to do.

Michael: Know what the seams are and where to put the seams together.

The seams are where most of the interesting problems arise, it turns out. Usually, within those systems, they are all fine, but then interoperating with other systems, has big impact. I am not actually sure what that role will be called eventually, because I feel in a few years we won’t call it architect anymore, we’ll say some other thing, but it still will be the same thing.


5. What is your least favorite part of being an architect?

My least favorite part is that I don’t actually get to spend as much time writing actual software and seeing that software work. But earlier, when I was directly working on some individual projects, I would write a significant portion of that software and that’s a really satisfying feeling. That’s why I got into this industry to begin with, because I like writing software. I guess what I am trying to say is it’s hard to complain about anything, because jobs working on software are awesome, they are super great jobs. If you have to pick something, that I don’t like about it is really the most first world problem-y response like “Sorry, I don’t like that, I don’t get the satisfying feeling of seeing my production like boo-hoo, how terrible!” These are all good problems to have, but I guess if I had to pick something, then that would be it.


6. Is there anything creatively, spiritually or emotionally satisfying about architecture or being an architect?

Absolutely, there is. It is a different feeling of understanding and fulfillment that you get from being able to see these problems that are going to affect a lot more people than maybe even just one more team, being able to help people avoid these problems before they waste too much time building incompatible solutions or duplicating each other’s work or whatever. That’s a lot of leverage there, you can have a very big impact on an organization or on the product and that is very satisfying.


7. What turns you off about being an architect?

I don’t know. It’s the same other answer, it would be nice to write some more code again and learn some more stuff, as opposed to helping people debug problems and making a whole lot of diagrams on white boards.


8. Do you have any favorite technologies?

Yes, I do. But what I really have is a favorite class of technologies, and by that I mean I like things that tend to be breakable. If you saw my talk, on the Uber architecture, I like things where you can kill them at any time and then you not worry that your users will notice. This spans many different kinds of technologies. You can build systems that are all sorts of things that work this way, but when things are meant to work that way, I really like that stuff. Things like especially in a distributed database world, like databases that are designed so that you can kill a node in the middle of the damn production and users will never notice, that’s pretty cool.


9. Forgetting about your feelings about being an architect, what do you love about architecture?

I’ve been an individual contributor writing software for 20 years or so and I really like being able to take what I’ve learnt and help apply that to the new stuff that we are building, especially because of the leverage, the impact that you get by working on designs of systems. If you do it right, it can make a really big impact on the organization.


10. And again, thinking about architecture, and not being an architect, what do you hate about architecture?

I don’t know, it’s like one of those interview questions “What are your greatest strengths and greatest weaknesses?” - “My greatest weakness is I’m too hard of a worker. Sometimes I am too committed”.


11. What frustrates you?

The thing that I wish that I could change or fix is there is so much information out there that it’s really hard to know it all and be able to feel you are making truly informed decisions. Because things are moving so fast and there are so many potential options, potential different ways that you can build things that staying on top of it all is a big challenge.

Michael: I think some of what you are saying goes back to what you were talking about before that it’s not the end result that tells you whether you made a good or bad decision, it’s what information you had available at the time you made the decision that tells you if you did the right thing. It’s a great human tendency, and I think this has been written about by the behavioral economists, that we tend to judge results by was it a good or bad result as opposed to focusing on whether it was a good or bad decision.

Making decisions about things is hard. You always have limited information and something that I’ve learnt especially recently is you make the best decision that you can, but just being ok with the fact that there is just no way that you are going to make the optimal decision. You can only ever make the optimal decision with what you are working with and that just has to be ok, otherwise we can never decide anything. You’d be paralyzed forever because there is always more things you could learn.


12. What profession other than architecture - and I guess since you like being an individual contributor, it includes software developer - what other profession would you like to be, would you have picked that you would like to attempt?

Do you mean that I would have done instead of going in the software forever or that I would do now if I wasn’t doing software right now?

Michael: You could answer it either way. In other words, what is it that maybe 20 or 30 years from now you would do or what you would have liked to have done if you made a different choice.

A couple of things. I took a slight detour some years ago and I did an autonomous vehicle project with the DARPA Grand Challenge, I was on a team that did that. That was great, that was really cool. It is very much like a software problem. I had actually chosen to work on a bunch of the hardware aspect of it, which turns out to be not the most useful part of that project, but I just did it because it was interesting.

Michael: But if you did something in music, or being a bureaucrat or whatever.

Doing that was very cool, but professionally, if I wasn’t doing software, if I could just do whatever else I would do, is I would work on energy. I would work on figuring out some way to get energy that doesn’t destroy the planet. When I am not thinking about what software problems I am faced with, I am pretty regularly thinking about having discussions with other people about “What about this energy thing?” and “Did you hear about these people? They have some new battery breakthrough.” I would probably do that.


13. Do you ever see yourself not being an architect anymore?

Yes, maybe. Like I said, I really do enjoy writing software. I think it’s hard to say what the future will hold, I might go and write some software again or I might choose that I want to, instead of writing software with my hands, write software with a team and go more down the management track. Maybe, I don’t know. I guess I could easily see it happening. I guess I sort of thought it would happen sooner - I never would have imagined that I would be doing software for this long, but here we are.


14. When a project is done, what do you like to hear from the clients or your team?

There is an interesting assumption in that question, is that a project can ever be done. I’m having a really hard time thinking about projects that have ever been done. They are always just kind of asymptotically approach done, but truly done is kind of a weird thing. Maybe a launch is as close to done as you can get.

What I like to hear is that everything is going well and people tend to not notice. The contribution that I’ve made is just not a thing that’s on anyone’s mind. It’s honestly the inverse I think of what you would expect is that people would be raving and complementing or whatever this work you’ve done, but I guess I consider it to be the best result if my work just fades into the background and people just use it and they depend on it and they don’t complain about it.

Michael: They don’t even ask who is that masked man?

They are just like we “Oh, yes, we have these things. Good thing we have these things because now we can worry about our other problems.”

Michael: Thank you very much.

Thanks a lot.

Jan 05, 2016

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p