BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Randy Shoup on Evolvable Systems
Recorded at:

Interview with Randy Shoup by Floyd Marinescu on Mar 09, 2011 | NOTICE: The next QCon is in San Francisco Nov 3-7, Join us!
25:41

Bio Randy Shoup is a Distinguished Architect in the eBay Marketplace Architecture group. Since 2004, he has been the primary architect for eBay's search infrastructure. Prior to eBay, Randy was Chief Architect at Tumbleweed Communications, and has also held a variety of software development and architecture roles at Oracle and Informatica.

QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community.QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.

   

1. Randy can you tell us a little bit about yourself and what you are working on these days?

Sure. I am the chief engineer for the search engine at eBay, and what I am mainly working on is the next iteration of that search engine, it’s a real time search engine, highly distributed it needs to be fast, so that’s what I do as my day job. I have been at eBay since 2004 so six years, I've had various architecture and engineering roles at eBay, and before that I was a chief architect for a company called Tumbleweed which did secure software for enterprises.

   

2. And you once talked about designing evolvable systems. Can you tell us at a high level what are some of the practices or areas that you want to talk about in your interview today?

Yes, sure. I think there are a couple of aspects to evolvable systems, I guess more generally large scale systems have to evolve, so there is no system of such a scale as eBay or Google or Amazon or Twitter, that stays the same it’s always changing. And so the trick is how do you design the system so it can be changed relatively straightforwardly? So, I tend to try to think about these problems and break them down into smaller chunks so they are more digestible. So I like to think about it in terms of how do you evolve data? How do you evolve processing or sort of business logic and then how do you evolve systems, moving from one system to another or migrating schemas and that sort of things.

   

3. Let’s start with data. So, how can you design for extensibility and evolvability in your data schema and data models?

It’s a very general problem and I don’t claim to have unique solutions to it but what doesn’t tend to work is a very hard coded rigid schema where everything is fixed in type and name and data. Large scale systems tend to evolve toward schemas which are much more flexible, key/value pairs are one example, which we use relatively heavily, self describing data, so basically again keys and values and types to the other, I mean there are lots of schemes, protocol buffers from Google, Avro in the Hadoop space, a bunch of different ways, XML is another great example of self described data.

But you will find that these large scale systems will come up with schemas that tend to have those properties so you will have bags of properties, bags of attributes, bags of key/value pairs attached to entities and that’s because existing entities always tend to grow the types of properties that are about them so in eBay the user today has a lot more things that we remember about the user that they did five years ago and the same is true of eBay items or purchases at eBay, all the entities that we manage have grown over time and the places where we have been able to grow the most easily and flexibly is where we have these very flexible schemas.

As a corollary sort of related aspect to that, we also have found that you want to make sure that - so one aspect is being able to add these kind of arbitrary key/value pairs or arbitrary data to entities but the other is making sure that the entities aren’t all supposed to have the same schema so for example even an item at eBay so a car for sale at eBay has very different properties than a dress for sale which is very different from a watch for sale which is different from a DVD for sale and so on - so designing a schema that allows heterogeneous objects even of the same entity type if you like, we found that to be very useful.

   

4. You use the word "Schema" in this context but it sounds pretty schema-less with different entities that are similar but completely different, completely sensible key/value pairs, so what does it actually look like at the infrastructure and code level?

There is no real magic here. It is a kind of mixture, maybe a semi schema style is the way to think about it, so schema-less would be bag of name/value pairs, nothing more. Our uses tend to be semi-schema if you like so there are a bunch of attributes about a particular entity that are always going to be there, an item is always going to have an ID, it’s always going to have a price, it’s always going to have a category, various things that we know, it’s always going to have a seller, but there are lots of more category specific attributes that would vary depending on whether it was media or a car or something like that. And those things tend to be bags of key/value pairs more or less.

   

5. What does it look like at the actual infrastructure level? I mean what kind of databases do you use, it sounds like non-relational, how does that map?

We do actually use relational databases, Oracle for our transactional storage and the fixed fields typically map to columns, and non fixed fields map to some flavor, I mean it’s a column, but some flavor of non structured or semi-structured column. So the individual key/value pairs in a relational database don’t map to individual columns right there, twisted around or flipped around so that they all live in one column, or maybe several columns at all in aggregate together are the sum total of the key/value pairs. For certain use cases, not the transactional ones, we do represent those types of things as a legitimate key/value store and we have developed one of our own, cause the world needs another one, on top of the MySql in-memory engine and we used that for very rapid high-write, high-read throughput use cases, but the main transactional storage is in relational databases for us.

But you are right, it’s not their normal form, it’s leveraging the relational database for lots of things that relational databases are good at and also putting in bags or blobs or however you want to think about it as self describing data as well.

   

6. Tell us about extensibility with respect to processing.

Sure, so the problem here that we are trying to solve is we have a flow of logic and we want to insert more logic into it and we want that inserting of new logic into it to be relatively clean and flexible and impact less of the system. One of the best strategies we have and it’s not at all unique to eBay is an event driven model so if you think about stages in a flow, somebody lists an item and then we check it for various validity checks, and then it gets inserted over here and then other aspects get updated over there, if you model all those different stages in that flow as literally different steps and you connect those stages by events, so here is a new item event, this item has been checked, and so on.

If you model those things as events you can chain together that processing very easily so you can have stage A then stage B then stage C then stage D. And it’s very easy if you think of it as a chain to insert a new one in there, so now you have A, B, B`, C and D. Well now that it’s a chain it can also become a tree, so A produces an event and B consumes it and does something interesting and maybe somebody else consumes it, X or Y or Z, all could consume that thing in parallel. So that’s the easiest and simplest way is to take a very complex workflow or a very complex set of processing that you want to do is to break it up into these smaller chunks and connect the chunks via these events.

That’s the general principle and we do that between systems and we often do that within a system so within a particular system we’ll think of the processing as a pipeline so again we get a new item and we do this processing and that processing and that processing and so on, and if we connect that via pipeline, and we configure the pipeline so imagine there is a kind of dependency injection scheme connecting up the stages in that pipeline via configuration rather than via fixed code. Then again you are a lot more flexible when you want to add a new stage to the processing you simply modify your configuration and go with that.

   

7. How did you do the choreography, what kind of tools do you need for that? Do you need like a message bus? How do you do them?

We have leveraged some enterprise service bus technology in-house and I don’t have a lot of detail about that that I can share. But the question is a valid one: anytime you have one of these event driven systems, the way to make it tractable and easy to think about is some kind of higher level choreography around it so yes, that is absolutely right.

   

8. And do you have to describe this model somewhere? Who owns the actual definition of the entire process? Which event goes where and where, that sort of thing?

That is an excellent question. Well the nice thing about events is that if the event describes something real and tangible than it’s very easy - events are a way of actually not having to think about the system as an overall flow if you see what I mean, it’s really good for actually decoupling different parts of the system, so you can imagine that there is one group or one system that accepts the user information and produces the new item event that actually happens, and then there are a whole bunch of down stream consumers all that are operating in parallel and all doing different interesting things that all consume that event.

And so you actually don’t have to have one team or one person or one representation of that entire flow you can really think about it piece by piece, because in this example the new item event is well understood what it means and it’s got well understood semantics and so on.

   

9. What are some of the pitfalls with taking this EDA approach. Is it possible to do too much? Ho do you know what kind of processes lend itself well to this, which ones should not be within a process flow like that?

Yes that’s a good question. I am not sure I have a magic answer about when to do what it’s all a case by case decision but you can do any good thing, too much of a good thing is too much. If a flow got to be a hundred steps, a thousand steps, it starts to become intractable and nobody can think or reason about a system like that. It’s possible to make events too - the granularity of the events is important to try to get right, I was a little bit alluding to that earlier, if it’s too coarse like an event that says something changed, that’s not all that useful and an event that’s super granular like this micro-field changed is also not as useful for that other reason. Event driven programming - I mean there is a reason why we are still talking about it decades after people thought of it, it’s because it’s hard, it can be hard.

You can help yourself by doing lots of good tooling around it and registries about what events are available, what their semantics are and so on and we do all of that. But people tend to think linearly, by default if you like. And so one challenge is trying to make these kind of asynchronous loosely coupled systems feel and seem more natural and of course - I say of course because I think of myself - I naturally tend to think in a sort of imperative straight way so it does take a mindset change.

   

10. How do you handle rollback when the process is asynchronous at each step?

Asynchrony makes rollbacks easier because depending on what went wrong, if you like, you may only have to replay one step, and then everything naturally flows from there without having to tell everybody. So let’s say we had a flow that went A-B-C-D and something went wrong with B, well we could replay the input event to B and then rerun B again and then maybe it’s correct the next time, and then C would just naturally happen and D would just naturally happen, without any explicit intervention, we wouldn’t have to explicitly tell C and D to do their jobs because their jobs are triggered asynchronously from the output of B if you see what I mean.

So again in the same way as this kind of event driven composing schemes make it easier to reason about the individual steps it also makes it easier to deal with the case when they fail. And it’s also more obvious that they can fail, and I hope I can express this idea, once you say that I am a consumer of an event and I produce this other event you are in the realm of knowing that producing and consuming can fail or take a long time and so it puts the developer’s mind in the right framework for dealing with these distributed systems problems if you see what I mean. That’s the flip side, it’s hard but once you get there it makes the other hardnesses about distributed system more obvious if that makes any sense.

   

11. Let’s talk about system changes. How does eBay handle migrating parts of the site, changing feature sets, breaking old feature sets, how do you do that?

Yes, as with any site of any size, eBay is constantly changing and we are constantly making those data changes like we mentioned and we are constantly making additional changes to processing. But we are also migrating systems so that could mean something as straight forward as moving an existing system from one physical location to another physical location, something like that, physically migrating a system, it could mean introducing a new piece of infrastructure, so we were on database A and now we are on database B. It could mean introducing something even larger, large scale revolutions, the general point in all of these is that if you want the site to stay up, and we do, no matter how large the change is that you want to do, you have to decompose it to much smaller steps, smaller changes and each of those changes needs to be both forward and backward compatible, in other words I need to be able to move forward but you always have to leave yourself a net, you always have to be able to come back to an earlier step.

So, let’s take an example, let’s say we want to migrate an application from database A to database B. So you have an application A which writes to database A and you ultimately want an application with code B that writes to database B. You can’t just flip a switch, the easiest way to do it is take an outage, tell all the users to go home and shut down A migrate everything over, check that it’s all healthy and then bring up B. First of, that really works, we have seen in the news some examples of that, but more generally you can’t do it, you just simply can’t do it in the 24/7 system, you can’t shut the system down. So, how do you do it? You have to break it up in individual steps.

Here is how it goes: the general principle is you need to be in a transition period from, when you are going from database A to database B you need to be writing to both places during the entire transition period that is the general scheme. So application A writes to database A now you start changing you start replacing A applications with B applications, so that the B application needs to both write to A in an A friendly way, same exact format same exact semantics, as when A was writing it and then it can also write in some new format or some new schema or whatever it is for the new system B. So B is doing what we call a dual write into the A side into the B side.

And as you keep moving along and replacing A applications with B applications, you are writing more and more into the new, you are writing both into the old and into the new. So your system is completely over to B, you have correct data in both database A and database B, and now at your leisure you can decide but at any time you could switch back to A and that’s the key point, because you have been writing to the database A you can turn that A application back on and it will behave correctly. So at some point you could have a transition to B and it could be after hours or days, or months you decide "Well we’ll never go back to A we are very comfortable B looks really good" and you can simply flip a switch in B turn off the idea of writing in the A database and you can decommission that database or do something else with it and then you are fully over on to the B scheme.

But again the general point is you have to always plan for this transition period and the transition period isn’t instantaneous and you always need to leave yourself the room to go forward or backward.

   

12. How do you abstract the complexity of writing to two apps at once from the application layer code?

That’s an excellent question. The flip answer is you don’t. You can abstract it at some level but you can’t make that problem completely go away at all levels. Depending on the type of change that you are making, some types of changes you can hide completely at the kind of data access layer, if it was simply migration as it was in my made up example of moving from A to B you might be able to hide that completely under a data access layer and no one above that would be any wiser. More typically though the application is aware of that migration and you just need to apply good engineering principles about not making every piece of code be aware of that but just isolating it to the parts that do need to know about it.

But typically the applications often do need to be aware and they certainly do need to be aware of that switch I mentioned of writing to both versus stopping writing in the old one if you see what I mean.

   

13. Do you have any tips for the architects to convince their managers to take on the cost of doing this slow migration approach, I am sure it must be quite expensive to plan for that style.

The downside is you are paying for double the writes, and all the infrastructure associated with that but the upside is you can have an available system. And again there are plenty of examples in history, recent history in particular where if when you do a system migration you don’t plan for being able to go back and that system migration doesn’t go perfectly which it won’t, it just simply won’t always, then you can be in real trouble. So I mean it’s a risk, why do we have multiple data centers? Why do we have redundant machines? Those all cost and they cost for a reason what we are paying for is availability and this scheme is exactly to guarantee availability during those large scale system changes. I like to think about it not so much as how can you afford to do it but how can you afford not to do it?

   

14. Are these the kinds of changes that you only do like from time to time at eBay? Or how often are you engaged in these incremental change upgrades?

Right, so for any given system we don’t tend to upgrade it constantly but if you take in the aggregate now you say tens of systems, hundred of systems, thousand of systems, at any given moment there is one or several these kind of large scale migrations going on at any time at eBay. So the way I like to think about it is on a large scale system, the transitional states that I mentioned those are the norm. And that is also another reason why application developers need to be aware of it because it’s normal that pieces of the system will fail and it’s also normal that pieces of the system are in flux. And if those aren’t handled correctly and typically that’s an application level job, then the system won’t work.

   

15. That makes sense for small changes but what if you are doing a complete system rewrite? How do you handle that?

Well, it’s the same answer it just takes a lot longer. We have done quite a few, depending on how you count, four or five different complete system rewrites at eBay and none of them was "Stop the whole site, rewrite everything come back a year and a half later", so the site was originally written by the founder in Perl over a long weekend in the story, then we migrated to C++, we were on C++ for quite a while then we moved to Java in 2002-2003 and we are on, depending on how you count, our second or third iteration of that Java infrastructure, and in all those cases we don’t redo all the pages, or all the functions of the site all at once, we typically take the hardest ones because those will find all the problems and we convert those and again we often do dual write strategies if we're also messing around with the backend.

But we take it piece by piece so at any given moment and this is actually true today part of the site is served by our second version of software still in C++, some of the site is served by or third version of software which is the earliest Java version, or earliest Java framework if you like to think of it that way, and then other parts of the site are served by later iterations and over time more and more of the site will be on later iterations and less and less of the site will be on the older ones. Typically it’s frankly a cost benefit analysis, if we think of the site function by function we can go what’s the benefit of converting this thing versus the cost of converting it? And if you like you can imagine sorting all the functions of the site by that order and just tackling them one by one.

   

16. Any parting thoughts about evolvable systems?

It’s something that everybody knows but doesn’t say all that often, I mean everybody who has had a system that has more than one version had to think about evolvability of code, of data, of systems. And yes I guess it would be nice to see more discussions about the principles or patterns about it and that’s a bit the idea of this conversation trying to get it started, because I think my sense is that there are some easily extractable patterns or nuggets here that are really widely applicable and it would great to be able to get more knowledge and interest and thought in developing those.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT