Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews The Rise of DevOps with Jesse Robbins

The Rise of DevOps with Jesse Robbins


1. This is Harry Brumleve I’m at QCon San Francisco 2012, I’m sitting with Jesse Robbins. Jesse can you tell us a little bit about yourself?

I’m one of the cofounders of Opscode, we make an infrastructure automation framework called Chef which is popular increasingly actually, popular in a wide variety of communities, a lot of developers use it to automate scale, manage, deploy, infrastructure as code. And before that I created a conference called Velocity which is the web performance in operations conference. It is now about two thousand people every year and it’s in three different cities. So it’s a nice big movement that started really around a set of ideas that was catalyzed into DevOps, so the concept that developers and operations need to work together as part of a team towards common goals rather than being to warring factions who hate each other. Prior to that I ran operations for Amazon, my title there was “Master of Disaster” which comes from my background before that, where I was a firefighter EMT and an emergency manager. So I bring all these different things together and try to make people have more powerful tools and more powerful culture to build great stuff.


2. So you are here at QCon to talk about culture of DevOps and how it relates to the entire company[...]Is what you are talking about really Lean apps?

Harry's full question: So you are here at QCon to talk about culture of DevOps and how it relates to the entire company and there are some other practices and movements going on specifically Lean process, Don Reinertsen talking about Kanban and Toyota, Lean UX, Jeff Patton was here, all talking about cycling and learning and building and measuring. Is what you are talking about really Lean apps?

It’s interesting what Lean describes and tries to capture comes from the supply chain and manufacturing revolution that happened starting in the 70’s with Toyota Production System and really the beginning of that was engineering, the people that are designing the car that you are trying to manufacture and the manufacturing process had to be very tightly coupled and work together to consistently deploy something over a period of time that emerged through the production cycle with no defects. That really has resonated extremely well with the model that we think of when building and running large websites or software within enterprises or really anything. What we build as software developers, as architects, as engineers on either side, is something that has to take the entire system into consideration. I like to talk about the difference between Toyota and GM, so Toyota had Kanban, so when you are manufacturing something and it’s broken, you pull the andon cord and they stop the line and you fix it, in the same way that you want development and operations working together.

So you don’t ship some code over the wall and then have, what historically the model in our world was: “Dev wrote the code and shipped it to Ops who then had to make it work”. That is what GM would do, you put consistently the engine in backwards and so the make a team flip the engine in the right way and then do all this post-hoc fixing. When we think about what is happening with the DevOps Movement, with the infrastructure as code, with all these things coming together, what we are really talking about is a model where teams work together to produce consistent, high-quality software that delivers business value using a set of processes that are integrated at every step together. And that is what we try to help people to do at a technology level with Chef and at a cultural level by making it clear that it’s not just an Ops tool or a Dev tool, it’s a deployment tool that is used to make the whole business work better consistently.


3. Jeff Patton would argue that you bring in developers and within the designers you have a balanced team, as he put it, you would give up share the Two-Pizzas?

Yes, the Two-Pizza model emerged at Amazon, so the interesting thing was at that point we started embedding our operations teams at Amazon into those Two-Pizza teams in order to help them ship smaller features faster and the most important part is understanding that operational considerations are a part of the application design: how you deploy your software, how you manage your software, how you troubleshoot, how you roll forward your software when you need to push a fix. These are not distinct things that happen after it’s done, it is part of a lifecycle that begins with the initiation of a product or project, and so when you think about how those teams need to come together, they absolutely need to be involved early or operations needs to provide enough of a platform that there are good sane defaults that developers can use just to get going without their involvements as long as they fit into an existing model.


4. So what you are talking about really is the rise of operations as a First Class Citizen in the development process?

Is not just that it’s a First Class Citizen, in 2007 on a seminal post for the beginning of Velocity and the awareness on what we are trying to do over the next few years in web operations, I said that success at scale depends on the ability to consistently build and deploy reliable software to an unreliable hardware and platform that scales horizontally. That is the success pattern and that incorporates everyone really as equals working together towards a common business purpose. This is the thing that Agile made so clear, this is what has come from the Lean movement and many other sorts of approaches which is you only get value when you are actually deploying and running your software and so that’s got to be built-in as part of the concerns on day one.


5. So you kind of mentioned a Data Center Level API in previous questions, can you talk a little bit about that, maybe how that applies to EC2 and what everybody would know?

One of the really profound things that have happened over the past 5 or 6 years is EC2 and sort of infrastructure as a service generally made it possible for infrastructures code to really exist. Before that point, Data Center Level API’s were a rarity. So it’s not just your servers it’s also your storage, your networking, your load balancing all these other components. And historically those have been largely manual processes so it’s very hard to write an automation package that touches all the different pieces and integrates with your software deployment model prior to the existence of something like EC2. Chris Brown is Opscode CTO and he was the founding architect of EC2 so he actually designed the API and the patterns that came from that.

What he focused on was not trying to enable the previous processes where you’d file a ticket and make a request, but instead just say: “Yes, infrastructures should appear to serve a purpose and disappear when it’s done”, and you want to take as many of the arbitrary bottlenecks and constrains out as possible and render it down to a primitive. And that has served as the basis for the progress that we are seeing generally across infrastructure, across sort of the ways that people are delivering faster and faster, we call it “Time-to-Value”. So when you take a lot of unnecessary choice and variability and frustrating manual processes out early, and you instead say: “We are going to make this as easy as an API call and you are going to be able to talk to with your configuration management platform that integrates deeply with your software deployment model which integrates with your load balancing and everything else, and that can all happen with code rather than with people having to be in a loop constantly. Rather, it’s people writing the automation.” that changes the world and makes entirely new things possible.

The best example that we use now in terms of sort of just taken ad absurdum, one of our customers, Cycle Computing, deploys now fifty, and nearing a hundred-thousand, core supercomputing clusters on EC2 using Chef and then a management council that they build called “Grill” which allows them to scale up to a massive super computer on EC2 in about 40 minutes for a single scientist job, where they are doing protein folding to do drug discovery and for trying to cure cancer. That is an application pattern that was not possible prior to the emergence of these both ideas, the Data Center Level API and then the frameworks that work with it, of which I’m proud to say that Chef has become both very popular and served as the base of thousands of organizations automation approaches so that the developers can work together, operations can work together towards of that common purpose in a totally different way.


6. If Chef and DevOps and the Data Center Level API they are all working the way they should, you just almost not notice Ops, DevOps becomes kind of ubiquitous and kind of becomes wallpaper. How would you demonstrate its value to say a manager or a CTO?

Well it’s interesting, usually when companies figure this out the line becomes so blurred that I like to say: “Infrastructure as code, applications as services, and Dev and Ops as teams”, these are the attributes of successful organizations over time. So as they begin to do this, because the lines are so blurred, the patterns really shift. So the demonstrated value is in the agility that the company gets, and I haven’t run into an organization where that’s really figured out how to do this well, where anyone then begins to forget that: “Remember this is an operational discipline”. What is different, the way that you can tell is that they really focus on time-to-value; what is in the way of us getting great things built and deploying them in a way that is easy to manage and scale and it supports the needs of the business, so it’s very efficient and enables into change very quickly.

And that is the evolution that you end up seeing from Dev and Ops being separated, to much more of what ultimately becomes a platform as a service, a PaaS. And we see this with our most mature customers, they’ve got this awesome foundation that they’ve built, the infrastructure is code, they are building their applications so they can easily deploy them, change them, etc, and then really what happens is what we think of as Ops becomes the provider of that platform. So they are making sure that there is enough network capacity, that the tool chain that is available to developers end-to-end provides enough sane defaults that they don’t have to call them in order to get help for them normal stuff, and then one thing brakes, they can step in as the actual Fire Department saying that: “We know how this all connects together and we can help you”.

So they are still working together very much as teams but really when organizations figure this out, the challenge is on how does Ops demonstrate its value, because no one is looking at that as an independent thing anymore, instead are we doing enough in order to capture that, the value out of what we are building, can we do more, should we be doing things differently, should be consider other platforms and approaches that become possible because it’s so easy to change quickly.


7. So I come from a pretty small shop, we don’t have a lot of automation, we have the aspiration to become awesome, how do we get started?

The work that we’ve done at Opscode and with Chef, we really have focused on enabling as many different organizations at different scales to realize the power of infrastructure automation and not have to figure everything out from first principles. So with Chef we offer, there is an open-source version which in general if you are a small shop, we recommend you actually just try the hosted version, we have a free trial and below a certain size, it’s free today, and then connected with a community site where we have over 700 recipes, many of which just work with out of the box with a single command that will allow you to deploy, configure and manage an environment on EC2 or on your desktop or in a data center, doesn’t matter, with a single command and as code.

So you are able to really build around these components without having to figure it out from first principles or try to be a sys-admin, when you really you are trying to deliver value directly to the customer and not care about the underlying components at the early stages. We are really proud of that and it’s not something that we’ve built. We built the community site and we manage about a hundred of those recipes that are cookbooks that are shared, but really it’s a community that comes together about 20,000 people or so who are building these components and yes, there are many ways to configure MySQL, but you probably don’t really want to start off a project that just needs a database figuring out the right way, use the best way that is available on the community site and then change it over time.

So that is one way that we see advantage and that stands on the back of that infrastructure as a service stuff that is already there, so you can already build with EC2, you can already build with Rackspace or you can use tools like Vagrant, which allow you to build these patterns early but running on a local box in your local developer environment. So say: “Great, now I’ve got this running locally”, same one command to deploy and that into production and get that running using the same tool chain, so using Chef and then whatever other components Chef pulls in for you as part of your automation. So that you have infrastructure as code right away and you are not suddenly having an operational burden that you didn’t have in development when you go into production. And that is kind of the value: don’t repeat yourself, just build it right early and that is how small shops are able to use this, and we see this all the time.

So one of our earlier success stories was a company called “Go Time”. They were the first customer on hosted Chef. Four guys, Go Time’s now a much bigger company, but they went two years just as four developers with no outside Ops help. Using off the shelf components and reach really large scale really quickly because they had made that early infrastructure investment and implemented it easily and then managed this code going on. It made a very different story for them then if they’ve had struggle through at each stage introducing new bottlenecks and that is really where we believe the future is and that is what I’m proud to be part of helping to make happen.

Harry: Thanks a lot for your time Jesse!

Thank you!

Jan 17, 2013