BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Alex Papadimoulis on Delivering Web Scale Systems
Recorded at:

Interview with Alex Papadimoulis by Manuel Pais on May 25, 2013 | NOTICE: The next QCon is in San Francisco Nov 3-7, Join us!
13:43

Bio Alex is speaker and writer who's passionate about looking beyond the code to build great software. In addition to founding Inedo - the makers of BuildMaster, the popular DevOps platform - Alex also started The Daily WTF, a fun site dedicated to building software the wrong way.

Software is changing the world; QCon aims to empower software development by facilitating the spread of knowledge and innovation in the enterprise software development community; to achieve this, QCon is organized as a practitioner-driven conference designed for people influencing innovation in their teams: team leads, architects, project managers, engineering directors.

   

1. Hello I’m Manuel Pais. I’m here with Alex Papadimoulis at QCon London. Alex is the founder of Inedo and he gave a presentation about Delivering Web Scale Systems. Alex can you explain briefly the distinction you made between Distribution and Delivery?

Sure. So generally when you deploy software to an environment you’ll just take the bits that you want to deploy and put it on X number of servers, but when we are dealing with the web scale, what’s important to do is consider that we don’t want just to put those, we can’t directly put all those files on the environment for performance reasons. So distribution is the act of sort of putting them into an area, a staging area or a server where it can then be distributed and delivered to all of the different servers. So effectively Delivery is just the act of taking any bits of software and putting it on N number of servers.

   

2. You also identify three different types of roll-outs: Live, Rolling and Parallel, can you explain a bit the difference and in which context would you pick one over the other?

The notion of Live is that you just have your group of servers and cluster and just one by one just startup those or do them in parallel, but there is no real logical order to them. You just do them in whatever order you feel and while the web app is live. So the consequence of that is that you have changing code in the middle of a live running application which can be just fine depending on your application. The nice thing about Live is that it doesn’t involve any other servers. You are just deploying to the servers that are running the app but your application needs to be able to support that, not everything can handle half of the servers being on a different version than another. The other two involve just sort of pulling servers out and have them not running. In the idea of the Rolling you will pull half of your servers out of the load balancing system, deploy to those servers, swap them back in whereas in the Parallel you just set up an entirely different sack of servers and deploy to those. Now the reason that you want to do the other two is when you can’t do Live and the main advantage is that the main technology you use for that is cloud-based things. It’s a lot easier to do these things when you virtualize the servers and on some platforms like Amazon, EC2 it’s almost trivial to set up an entire new stack of servers, deploy to them, tear them down and then discard the old ones.

   

3. So you think when you are using a cloud provider that makes it easier to deliver your application for web scale type of applications?

Live is by far the easiest way to go. Then you don’t have to worry about pulling servers at a load, switching load balancing. I like simple so I would say if you can do a Live distribution. But if not, if you are going to go to the trouble of setting up sort of a cloud-based delivery, you may as will go with Parallel because both Rolling and Parallel are complex, so if you are going to go down that road you may as well set up Parallel if possible. Rolling tends to be preferred for when you don’t have instantly virtualizable servers like physical servers on racks or you know, well defined servers that you can’t easily move in and out.

   

4. Another concept you talked about was the separation of delivering files, databases and environment changes. Why do you recommend that and which timeframes would be advisable for each type of change then?

An application really has and consists only of those three elements. A release of an app will only have the files which run the code, the data and then whatever special environmental considerations. So as with everything you want simple to reduce the risk, less moving parts, less pieces. So if you can pre-deliver database changes, let’s say it’s something as simple as adding a column, chances are you can add that column now and none of the code will mind. I It’s just a column that’s there that the code doesn’t really care about, so if you can deploy the database changes first, you’ll reduce a lot of risk later when you try to deploy file and database changes at the same time. Environmental changes are another thing too that you can do well beforehand. If you need to add a new website or a new configuration for Tomcat, you can do that well ahead of time before you deploy your actual code. Just gets tricky when, and web scale especially gets very tricky when they need to go exactly at the same time, that should generally be avoided if possible.

   

5. And how do you avoid that when you decoupled infrastructure deployments from the application deployments, how do you guarantee that the applications environmental needs are still the same when you deploy it compared to when you deploy the infrastructure because there can be changes in between?

Sure, and that’s actually a big problem. Your testing, staging, production environment, are going to be out of sync and one proposed solution is to deploy entire new sets of infrastructure when you deploy your application. That is again creating a more complicated deployment and while it does guarantee an environment is going to be exactly what you specify I think a better approach would be, if you have the ability to deploy infrastructure on demand, which in Cloud is relatively easy, you can deploy those environments well ahead of time of the applications and don’t deploy them at the same time. So when your environments are code driven or code based or well defined in some tool or configuration file, you don’t need to deploy those and the app at the same time. The deployments can be orthogonal and it will reduce a lot the risk that way.

   

6. But do you have any mechanism to ensure that the definition of the application configuration is not changed or if it changes that you will redeploy the infrastructure? How would you manage that?

Well if you are using a tool like Puppet or Chef to do the server configuration management, those manifest files will ensure that the server matches whatever is defined in those at the same time. So the idea is that when you change your manifest file that will then go and alter all those servers, so as long as those tools are working correct, your manifest and your servers will be ok. The idea is just if you are going to change manifest files, do that at a different time than application deployment.

   

7. You also advised using pull systems mostly for infrastructure deployment but push systems for application delivery. What is the fundamental reason for that?

Pull for application delivery is very difficult because you can’t orchestrate and control exactly the manner in which you want to do. You have to somehow control all the servers that are having code deployed to it, so having push-based gives you that very explicit control. But your infrastructure being pull-based, the advantage to that is that you can just take an empty server, well a server with an operating system, and your CM framework like Puppet, Chef and just drop that in the network and it will automatically configure itself through pull-based mechanisms. Generally when you do that you don’t need an instant deployment whereas when you do an application deployment you want that deploy time to be as short as possible. The push-based approach ensures that when it’s pulling you have to just make sure the server knows exactly when and how to pull it to minimize your deployment window.

   

8. You’ve also talked in the past about over engineering and getting read of code complexity. In terms of build and delivery processes, what you think are the greatest enemies of simplicity?

I’ve seen a lot of different patterns emerging and especially with the cloud and I think one of the most complex patterns I’ve seen is this notion of having a fully integrated application server stack. So the idea is that you have all of your server definition, all of your application definition, create a server out of it, deploy your application to it and then deploy through each of the environments. The reason that is complex is that compared to non virtualization that would essentially be like creating a server and then moving it from a rack to rack to rack as you go through production. And the reason that’s complex is because you have a lot of moving parts every time you come up with a new server because it’s not going to be the exact same image. It’s going to be, it’s going to have new patches, perhaps a new version of some underlying components. So that is a really good way to make your application delivery a lot more complicated: to deploy the whole server with it. There are a lot of simple tools available to do it and as long as you just try to keep changes isolated and deployments isolated and keep things as simple as possible by deploying the smallest amount of things in the easiest way possible.

   

9. What about changes that you cannot decouple when, for example, you have a database change that is not reverse compatible? What would you suggest in that case to minimize the risk of the overall change?

In the case that you can’t have a database change that will work on old versions of code and new at that point just be careful and be aware that as soon as you deploy that database change you can’t roll it back. You can run a script to try to undo that change but there is really no good way to roll that back. It’s easy to roll back files so if they have to go at the same time just be aware of that and do them both carefully at the same time.

   

10. In your opinion what are the most important aspects to keep in mind for an effective delivery process which can also scale easily when you are talking about web scale applications?

I think simplicity, a lot of it goes back to keeping it as simple as possible and as you scale, you can scale your deployment plan to be more complex. The Live system is very easy. If you can build a website that is compatible with Live style rollouts, that is going to be the simplest as possible. Then you don’t have to really worry about figuring out how to pull things in and out of load balancing or in another virtual network, but one important thing is that whatever you do consider it from the beginning. Deployment is something that is critical in a web scale app or really a lot of applications because that is the mechanism in which it changes and if you can make changes easier to the application, that is a lot more value that you are able to provide as software developers.

   

11. Now for a final question, can you tell us a bit about your current interests, other projects you are developing or interested in?

Right now through Inedo and our product BuildMaster we are really exploring the space especially that applies to the cloud and cloud delivery systems. What is particularly interesting is this notion of, it’s sort of a mix between infrastructure as a service and platform as a service, things like Amazon CloudFormation and tools like that., They allow you to define a very explicit infrastructure and also have your application sort of live in that but not be as tightly coupled as platform as a service, so we are exploring sort of best practices in that and working with a couple different companies who are starting to explore those and seems like it’s an interesting space right now.

Manuel: OK, thank you very much Alex!

My pleasure!

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT