Monty Taylor and Jim Blair on CI and Test Automation at OpenStack
The OpenStack community has a team working on CI and test automation for the OpenStack developers submitting code.
They run their own infrastructure - an OpenStack cloud by itself.
Given the complexity of the project, with dozens of dependent projects and over 300 contributors submitting patches every month, standard CI systems simply wouldn't work.
InfoQ: Today, how many commits does your CI system handle every day? How about 6 months later when Icehouse is releasing?
Monty: During the peak, I think we landed around 400 commits a day. That's only the ones that succeeded, which is less than the number of changes that we test, because they only land if they passed the tests.
Jim: And when the commit is reviewed, we test them again before we merge it.
Monty: For each of those commits that we merge, there are about 8-10 different jobs that we run. And since we run them once on upload and once again before landing, that makes 20 jobs per change. There was a period of time, during the peak, where we ran 10,000 jobs in a day.
I think we doubled our velocity from Grizzly to Havana. I believe we have been consistently doubling each cycle, so by the time Icehouse releases, I expect this will double again.
InfoQ: What are the test jobs you run?
Jim: There is a coding style check. This is important because we have so many collaborators working, we have to make sure people follow the same coding conventions and use the same coding styles. It’s one of our simplest jobs, but also one of our most important.
There are unit test jobs, simple jobs just testing the project in question, no network interactions with other components. We do that for a couple of supported platforms: 2.6, 2.7 and 3.3. We run our 2.6 jobs on CentOS, and our 2.7 jobs on Ubuntu.
Then there are the integration test jobs. That's where we use DevStack to install all of the components, then we run templates on all of those components after they have been setup on a single node cloud instance. And we run several variations of that - all of the components can be configured in different ways, they can use different databases or messages queues, etc. We could run a lot of variations butwe try to keep it small and run only the most sensible variations: MySQL, PostgreSQL, RabbitMQ.
Monty: We’re actually talking of adding ZeroMQ tests.
Jim: If a component becomes really important in the community and more and more people are using it, and if more people are willing to help debug problems, then we'll start running tests for that as well.
InfoQ: Who writes these test jobs?
Monty: The developers. We have a small QA team primarily focusing on the machinery of testing systems rather than the test content itself. So we require that our developers write tests. They write unit tests and integration tests.
Jim: In fact, we are talking about getting even stricter about that in the cycle: if you want to land a change, you need to have already written all the integration tests for that change.
Monty: We consider if it's not tested, it's broken. It's usually true.
Jim: Especially because the project moves so quickly. There are so many pieces, it's just too easy for somebody to accidentally break it.
InfoQ: Are you covering performance test?
Jim: Not yet, we’d like to get to the point where we can. I think Boris Pavlovic is working on a performance testing system called Rally. Joe Gordon has been working on scalability testing which is a bit different from performance testing but pretty related. We'd like to do those things.
There are many things we are not testing. But there is nothing that we are planning on not testing. We want to test everything but it takes time.
For this cycle, we are focusing a lot on upgrade testing. We have a small bit of upgrade testing now, but we want to do a lot more.
InfoQ: How long does it take to run a single test job on an instance?
Monty: It takes about 20-40 minutes, depending on the cloud instance.
Jim: We have done a lot of work to parallelize our testing and not only by running all of these different variants at the same time. We build a test framework called Test Repository for most of our unit tests which is really good at executing tests in parallel. It gives results pretty fast.
Monty: Also Zuul, written by Jim, which allows us to run sets of jobs to test changes in parallel, but keeping their sequence at the same time.
InfoQ: How many machines do you have to run the tests? What is the configuration for the instances used to run each of the test jobs?
Monty: We have no machines. All of our tests are running in public clouds. We have public cloud accounts donated by Rackspace and HP, and thankfully, they don't charge us any money. We have pretty much as many instances as we want.
Jim: During the last cycle, I think we've peaked at 340 instances running in parallel. Each instance is a VM. For integration tests, we start with a very basic VM - 8GB of ram with Ubuntu Precise, and whatever goes along with this amount of ram. So we'll grab a node and let DevStack install the cloud on this very basic VM.
Monty: It's more complicated than that, but that's the basic idea. We have a thing called nodepool, which manages the collection of VMs that run this, prepare them by caching. We want to pre-download everything that DevStack will want to download from the internet, so that the tests themselves won't need to touch the internet.
Jim: And when we're done, we delete it.
Also, we spin up many more VMs than our successful test jobs. Because of Zuul speculation model, sometimes it will be half way through a test, then realizing it will need something from somewhere else, so it no longer needs to run this test, it needs to run a different one, so we'll kill it and shoot up another one. If we run 10000 jobs a day, we might actually spin-up 100000 VMs.
InfoQ: Can we consider Zuul as an improvement to the nvie git branch model that OpenStack is adopting? It seems that Zuul doesn't work if we have too many working branches.
Monty: We actually don't use the nvie get branch model. Because we use Gerrit, it's actually much closer to the Linux kernel model where people send in patches via email. We don't make branches and then merge them. To a degree, each change winds up being like a virtual private branch. Rather than fixing a change by making a new commit and adding on top of the branch, we just amend the previous change. So we are working on individual commits that are going to land, rather than working on a branch merge model.
A developer could use a local branching model on their laptop if they want to, those branches aren't published branches. I don't know what branches Jim may use on his laptop. I use git in a weird way without any branches, I just reset refs a lot on my master because I'm crazy - I don't recommend it to people who are new to git.
It's actually Gerrit that forms the basis of our patch git workflow.
Jim: We want to make sure that when people are reviewing, they are reviewing individual commits. Ideally, for each commit that goes into the project, that commit has been looked up properly. There are no messy branches. It’s very deliberate about making each commit as good as it can be and then merging that.
InfoQ: Besides Zuul, you mentioned using Gearman to make Jenkins scalable, using Logstash to debug, and Test Repository to stream the test output automatically to committers. Currently, how is the feedback mechanism working? What do you want it to be like?
Monty: It’s getting better and better all the time. There are several things about this. Gearman is for Jenkins scalability. Jenkins was actually designed to be only a master, but there was a hack later. We now run a Jenkins master with a number of slaves to execute test jobs. We run a lot more parallel test jobs than most Jenkins installations do. There are many design points in Jenkins that involve global locking, and when you are using it the way we do, we consistently hit scaling problems.
Jim: Because Jenkins wasn't designed to be used the way we do.
Monty: So we wrote the Gearman plugin for Jenkins to allow Jenkins to register all of the jobs as potential Gearman jobs into the Gearman server. Then we can have multiple Jenkins masters for the same set of jobs, so Gearman will parcel out the jobs so that if one Jenkins master runs into scaling problem, we just make another Jenkins master.
Jim: Usually, after adding about 100 slaves to a Jenkins master, it will start to have problems. Like I mentioned, we had about 340 at once. That means we need about 3.4 Jenkins masters to handle that kind of load.
Monty: The other one that is really interesting, especially in the last cycle, is to setup the Logstash cluster. Each DevStack installation installs the entire cloud, then run tests against it. Even just installing the cloud produces a bunch of logs - you get nova logs, glance logs, etc. If there is a problem, it’s really hard for a developer to debug. So all of these logs are thrown into a very large Logstash cluster which indexes the logs using elastic search. We can then have developers look through the logs, looking for a pattern of what is happening. Joe Gordon and Sean Dague and Clark Boylan wrote the Elastic Recheck.
Jim: And I wrote the graphs for it.
Monty: As we hit flaky jobs that fail the tests, we'll then run scripts on Logstash to see if it is a type of failure that we have seen before. If it is, then we'll send you a link to what the bug might be. That is very helpful for people to find complex problems.
Jim: I think this is really unique and cutting-edge. There aren't many projects that are this big and running so many tests and generating so much data that is available to help developers. As a developer working on your machine, it is very hard to see all of the problems that might come up after running your code a lot. Our test system runs your code a lot! And we will be working more in this cycle to see if we can automatically classify and identify changes and behaviors that might help developers track down problems.
InfoQ: In general, what do you think is the hardest thing when working on this whole automation thing?
Monty: We have a lot of developers writing a lot of clouds. Our workload doubles every 6 months, the commit loads just keeps going up, we have to imagine what the next set of problems is going to be before they happen. Because once they happen, there is not enough time to develop a system that can fix it. Automating not to solve today’s problems but to solve problems that will come in the next 3 months.
And because we do all of this testing and changes are required to pass them, you have to make sure this system works every time. We have to be able to run the tests 10000 times a day so if the machinery isn't good, you might return errors to developers that aren't errors with their code, but errors in the machinery. We have to be very careful to write very solid automation, otherwise it's worse than having no automation. Also the internet breaks all the time. We have to work around the internet breaking - that's about half of what we do. All the sites, they all break. You don’t notice unless you’re hitting them with automation 10000 times a day! If Github is down 1% of the time, as a user that’s fine, you just retry. If my test systems are pulling from Github 10000 times a day, then that's 100 failures.
Actually we are very good performance testers of the two public cloud providers we run on. Sometimes we notice a problem, we'll call their operators and say, "Hi, is there a networking problem in your datacenter?" And they say, "Oh yes, we just noticed it that too!"
Jim: Both of them are OpenStack clouds. So basically we are running tests for OpenStack on OpenStack. On one hand we are testing the project code itself, on another hand we are testing it on the operations side. It is actually very cool.
About the Interviewees
Monty Taylor is a Distinguished Technologist at HP, Member of the OpenStack Technical Committee and OpenStack Foundation Board of Directors. He leads the teams that run the OpenStack Developer Infrastructure, Ironic Bare Metal service and the TripleO project using OpenStack to deploy OpenStack. He is one of the OpenStack Founders and currently sits on both the OpenStack Foundation Board and the OpenStack Technical Committee.
Jim Blair is now the Principal Infrastructure Software Developer for OpenStack, as well as OpenStack CI core developer. He's also on the OpenStack Technical Committee and is the PTL of the OpenStack Infrastructre Program. He currently works for the OpenStack Foundation.
Brandon Holt, Preston Briggs, Luis Ceze, Mark Oskin May 21, 2015