Bio Brian Zimmer is the lead of the Architecture and Infrastructure group at Orbitz Worldwide. Brian is responsible for directing the development of the company's distributed infrastructure, reviewing technical designs for adherence to architectural principles and overseeing the creation of domain models and service APIs.
I am a senior architect at Orbitz, primarily responsible for areas around distributing computing and kind of merchant services architecture and infrastructure, in particular areas around kind of moving our platform from our original domestic platform to our international platform and kind of all the work that's been involved in that.
Sure. We started out as US domestic, 7 years ago, so we were primarily air and then we started building other product lines: hotels, cars, packaging, so you can buy hotel and air simultaneously and had been always focused on US domestic. Then the business has two goals to achieve, one of which penetrate into European and other markets as well as lightweight point capability. So we could run small sites or other sites on top of our existing platform and give them the same features that we had: so for company came and said I'm interested in being able to kind of co-brand we could help them do that. So we have a lot of issues to come up with that. We have issues around how you're going to deal with currencies, so we've been US domestic, we made assumptions about the consumer and the supplier sharing this common currency. We had issues with content and everything was checked in as US English, so we manage that differently in the US only world than we would in a more international world. And then we had issues with other things that we kind of wanted to do architecturally, so do our services were laid out the way our databases were laid out, which were kind of naturally byproducts of a move kind of domestic travel site to and international travel site.
The primary architecture in the domestic site as it was a standard web application; that web application had pretty much three responsibilities: presentation, it had to do a lot of the business logic, so it's kind of a great area as to whether or not the web app did some business logic, or whether or not the services tiers did the business logic and then finally persistence. So it was really the owner of the master of record as well. So if you did air-car-hotel, that web application was the primary master of record. Moving down, we had a serious Genie services who were responsible for doing search and book and kind of the travel related product features and then moving one tier down from there another set of services in their job was to communicate with suppliers. So I need to talk to a GDS, GDS manages airline rates and availability, I need to talk with them, this services job was to do that, same thing for a hotel and a car and so on. So you're more traditional through each application with the web API that you own in the database.
I want to talk a little bit about suppliers. We have a dependence on suppliers, they are the owner of this inventory, so if we're going to go try to get an airline seat we're actually going to go back to an airline inventory system and try to book the seat there. And we're going to go try to get you a hotel room, we're going go to the hotel room, their CRS (central reservation system) and try to book a room there. So we have this exposure to them and one of the ways of dealing with that is in JAVA, we can start up a bunch of threads, let's say we're talking about HTTP, we've got a big thread pool. We start spawning off all these threads, communicate with these suppliers, we're waiting for these responses that come back, they don't come back, we need to manage this in a timely fashion to get back to the user of something that's occurring. So we turn to Java NIO or to manage a synchronous communication for HTTP, we've used Java Concurrent to manage the concurrency across synchronous calls in a more asynchronous fashion and so our concerns with the ... suppliers means that we need to manage that kind of request-response relationship and the concurrency relationship that we have with them. So Java gives us threads and so we kind of work within that, but we had to build all the solutions on top of it.
Our primary use case for messaging or eventing is at the end of a booking, or any really transactional event, so a booking or a cancellation or I want to send an email, what typically happens is we put this message on a queue and we're trying to kind of notify a lot of people something is happening. So the monitoring system or fraud system back in processing or the email system, that's the primary place that we use kind of asynchronous messaging. For the majority of our use cases to service user request, they're done through request-response.
6. You mentioned things like internationalization, I mean the standard answer that is use these property files and resource bundles, but I am assuming it's a little more complex possibly in your guys cases.
We do have a content management system and the content management system is responsible for managing that content which then at the presentation tier gets rendered into what becomes international formats or language sorting and the other things that we need to do. So we are a little bit beyond the property files, but conceptually yes, extracting information to somewhere where you can choose at runtime what the user wants to see.
7. I guess moving along with re-architecting in the platform, I am assuming there were probably a number of concerns and principles that you wanted to work into the new design. What were some of those and then we'll come back and I guess dive into more detail on how you solve some of those issues.
Some of the concerns that we had around anticipating failure, so being able to make sure at any point within kind of a runtime you're able to deal with a problem and anticipate how to best deal with it, whether you can fail over, whether it's OK just to do best effort. So if I am going to go get rates it might be OK with in one scenario to just do best effort, because in many races I can and return. Other principles that we want it was more visibility into our application, so we really augmented in our monitoring, so as requester flying through the system over these calls from all these different services, we wanted more visibility into that. So we turned to complex event processing and now this information is being streamed real time complex event processor looking at it and helping us understand what's happening in this much bigger system as we kind of grow products and grow services, we need to have a better understanding of what is actually happening and so that was another kind of area that we introduced much better technology in the platform re-architecture.
8. Drilling into that in terms of things like fail over latency, particularly tools that you've been able to use to help you with that, or is it just problems kind of unique to our business because of our concerns based on it so we have to architect something that kind of lines up with that?
We'd have introduced a new cashing technology that we have used and that's helped us kind of get back a little bit of the latency that we've been dealing with and gave us better operational support around it. We have introduced a retiering of where things live, so let's talk about databases. Before our database became, from the web application, from the front end application, anything that came through, I'm taking about the database instance. So now I have one database instance and needs to handle all sorts of travel concerns. So now we've taken those databases and kind of broke them up or partitioned them into air and car and hotel and kind of a generic master of record one so that we can scale those independently and manage those independently as we need to, so we've kind of introduced those kinds of more logic separations.
We use portions of it. At the web application we use a certain container, we use Tomcat and we use JSP so we're kind of taking advantage of some of that stuff. We do have kind of bits and pieces of other things we'd used. We don't use distributed transactions, we're not really making use of that. For the most we use Spring, we use Genie for wiring things together, so we're not really taking advantage of anything there. We have basically turned to using the concurrent libraries for managing all of our thread pooling or our kind of thread work management, so we're not really making a lot of use of that either. So I guess traditional JEE for us is not something that is a big part of our architecture. That said we do have some older applications which are making much heavier use of it, but our newer architecture has move away from there.
10. Along those lines you mentioned not using distributed transactions, one of the other things that happens besides just how much the JEE stack it uses, there is a whole list of norms that you as architects and developers were told to do this and do that, but when you get to larger applications you start breaking rules. Are there any other cases like that, that those types of things in terms of synchronization or other things that are applicable, so you have to go a different direction?
I guess things like database foreign keys within the new platform were making use of a kind of more soft foreign keys, so we have a more resource space architecture where you have an airline itinerary and a hotel reservation, but in a master record that is kind of joining those two, but it's sort of application managed as opposed to database managed.
11. Another thing with an application like Orbitz, there's variety different states going on of get these results, get the next set of results, aggregate them together, prepare for pricing. What do you do for the state problem in terms of volume versus keeping state per one simple transaction, one single person requesting something?
Our services are designed to be stateless themselves. When they do need a manage state the traditional approach for us is to push that out to cash and let the cash manage it. So that is partly true at the web app. We do have sticky sessions between the web app and our front load balancers and the session is living there and we replicate that to cash. We you come down a set of services, we're actually managing, so when you make a request for a flight and a hotel package, we go out, we find all those, we bring back the combinations of all those and we call them packages and we're managing that again off in cash. So when the request comes back in we can go pick that up and paginate for you or choose one of those products to go book. So our state management is generally done off in cash.
12. Changing gears a little bit you were also the lead of the Jython for a little over a year. What do you think of this kind of whole resurgence of Java as a platform with languages such as JRuby and Groovy and even a kind of resurgence in Jython, a little bit in the last year?
Obviously I think it's fantastic, it's great to see that the community is kind of adopting these languages, especially dynamic languages; I am a big fan of dynamic languages and I think being able to kid of marry the two platforms, the dynamic scripting language and the more available JVMs and the more accessible JVMs, I think is actually a powerful combination, so I am particularly excited about it. I think there are areas that need to overcome some problems ... time. I just saw some post recently about how slow it is to start up and that's always been the case. So if they can kind of overcome some of those areas it would be really fantastic, but I think one of the interesting models to do is to take these Java applications and the containers that are available to people, Tomcats and Springs, and put these languages inside there, kind of embed them so that areas where you don't need to spend all the time writing Java code can be better expressed in Ruby or Jython or whatever. And I think that kind of maturation will continue to the point where it won't be: I am running Jython, I am running JRuby, but they kind of melding of all these applications together, all these technologies together. I am excited about it.
13. Is there anything to make Java the viable platform outside of startup time? I know there has been some talks about some invoke dynamic ... and some things like that. You want to comment on that or anything else that you think, I mean for being a Jython lead I am sure you know the whole list of thick walls that you used to run into. What means to be there?
There are a couple of areas: one of the areas, just taking a language over to the JVM that was not started on the JVM. I believe this is true; Python is a reference implementation and is always trying to match that reference implementation. There is not necessarily a spec that you're coding too, so one of the problems, which is not necessarily a JVM issue, but it is as these languages mature keeping up with the features that change and the reference implementation or making assumptions about things in their reference implementation. So one example for that is we wanted to add asynchronous IO into Jython, while the time we were trying to do on the platform didn't have an IO. The C Python version did. Then the way the C Python version was implemented was very much in the flavor of C which is not really at all the way the JVM shows to do it, so you are trying to marry these two worlds and make a kind of common language on top of them and with no reference implementation kind of drawing the abstraction now, just a little bit more, it became a bit of a challenge. And there are certain things you just can't do like the JVM ... away some of the file system semantics, so changing directories for example is something that doesn't really work on the JVM, yet scripting languages tend to definitely have that feature, so there are just certain areas that are very challenging to implement. Kind of going into bycodes, there has been a lot of talk about making additional bycodes to support these languages and I think that is a good thing for them, it's a more natural fit. I know being able to build out all the infrastructure you need to get these languages to work on top of the JVM does take a lot of time, so anything that JVM implementers can do to assist that is a better win for the Java community on the whole.
14. You've worked with Orbitz a while, so what do you like about the architecture that's a really good way that we've done things and what do you really dislike about it, you'd like to redo as soon as possible?
One of the things I think that I enjoy most about it, I was on a panel yesterday and they had to show people hands which have a 4 notes system and most of them had their hand raised and they asked who was a on 40 notes system and a handful people had their hand raised and we far exceed that and I think it's that kind of scale that we work at which I find really interesting. I know we've had people move on and we've talked about our experiences later and one of the things they miss is that we have so many interesting problems to solve. It's not that we have a single box with just a little bit of business application sitting in there that you need to work on, but we have this vast array of problems and if you are interested in problem in computing we probably have an area that it can be applied, so I think just the kind of opportunities are one of the things I like most about working there.
I think one of the problems you have with success and having a site that's very big and has a lot of users is that you can't move as quickly as you might want to. So it's kind of double edge sword, you want a lot of traffic, you want a lot of users, you want a lot of features, you want to be able to build these bigger systems, but you lose some nimbleness when you get to a certain amount of size. We have conversations with other people who can make these changes just on a whim, and we are not able to do that and sometimes it would be nice to be able to do that, but other times it's nice that we have this other problem solved.
When they do need a manage state the traditional approach for us is to push that out to cash and let the cash manage it
I'm sure the problem can be solved with money but using a cache will be more helpful to manage state.
You touch on some really interesting topics. Carefully managing your interactions with hundreds or thousands of independent and unreliable supplier systems must be a real challenge. And as you and I have discussed, marrying that with a complex event streaming approach to monitoring was very clever and powerful. Great ideas in there!
Brian Zimmer Black Hat SEO