BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews John Leach on Virtualization and Unix Tools for the Data Center

John Leach on Virtualization and Unix Tools for the Data Center

Bookmarks
   

1. John, who are you?

I’m the Technical Director of Brightbox, which is a virtualized hosting provider with the focus on Ruby on Rails.

   

2. What does it mean to run Brightbox? What do you do?

When we started out 2 and a half years ago, it meant doing everything (because there was just me and my partner Jeremy), all the business stuff, build everything support all the customers. Nowadays we’ve got more people to help us out, at the moment, managing some part of the development, managing sys admins and engineering new systems, and some support, too.

   

3. Give us a look inside Brigthbox. What is actually behind the scenes? What's going on? Do you actually have machines? Do you farm that out?

We buy hardware, put it into data centers in Manchester. We run Xen on those machines and inside Xen we will create virtualized guests, which at the moment we run entirely Ubuntu boxes and inside that host Rails. We started out just like we did guests, a little stack in there with Apache and Mongrel and so on and so forth, but it's growing and we do a dedicated MySQL clusters and things for people. We are running MySQL inside there as well.

   

4. You are running virtualized images. Do customers get to build them themselves or do you provide them? How does that work?

We started out with Ubuntu Dapper and we customized it and added all the right packages. They are useful for Ruby on Rails deployment. It was not long ago when Hardy came out. We updated to Ubuntu Hardy and we provide customizations and all the usual packages are available. But then we build our own packages, so we got the Ruby Enterprise stuff that Phusion guys have done, Passenger packages - we maintain those. Anyone running Ubuntu can install Passenger and Ruby Enterprise because we just provide those packages for anyone, whether they are on Brightbox or not.

The Sphinx packages we've customized. In fact there weren't any Sphinx packages in Ubuntu, so we built those. We cater to the Ruby and Rails ecosystem, provide packaging and support for that running on Ubuntu, but we are planning on adding other distros. I think we'll start off allowing other people to use plain vanilla installs of maybe CentOS and Debian and so on. But we definitely want to move to the point where people can create their own image and bring along tools and then running with it. We're working on the tools at the moment.

   

5. Why Ubuntu? Is there a specific reason for using Ubuntu over, say, plain Debian?

I started out hosting stuff for myself and I was already familiar with Ubuntu and I had some great experiences running Rails and Ruby on that. A lot of packages were there. I had a little experience trying to make Sphinx run on CentOS and failed pretty miserably. I don't use Gentoo because I don't like waiting for things to compile.

It was a pretty good. We really like Debian, we like their packaging system and a lot of the things they provide, but Ubuntu was a little bit more modern at the time, so it seemed like a good place to start.It worked out really well. Ubuntu is just a snapshot of Debian anyway. You can get some support and things from Canonical/Ubuntu people, but we were in the place of those. It's got a good community so there is a lot of support.

   

6. Do you build the virtualized images yourself or do you use any tools to get to the images?

The way we build things is pretty straight forward so we can build everything you want in a chroot environment. You use Debootstrap which is system for building a very initial system. Then we can chroot in there and it's like you have a virtual machine, install packages you need, make changes you want to make and then, in our case we can just tarball it up and have a tarball somewhere.

Then our deployment system is just going to suck down that tarball and install it into any file system. It's that and a combination of some packages and we got our own package repository. Some things are packaged; some things need a little bit more customization, so they are separate.

   

7. You have your own apt-get sources that run locally and the images can pull information from there or packages from there.

Yes, that's right. This machine starts out with most of the packages that you need but we provide updates to the packages that we provide, using our apt repository. But it also allows anyone using Ubuntu or even some Debians to stick our repository in their apt configuration and start using our packages.

A lot of our customers tend to install a staging box at their office and they just get an Ubuntu CD and sling it in and install it and then add our repository and by and large they got Brigthbox and they get everything work into the way they want. Then, they just deploy to us and it’s pretty much the same, so it works out really well.

   

8. What kind of packages do you put in your own repositories? What are the reasons for that?

There was no support for Passenger in Debian, when it first came out, so we built packages for that. It's now been accepted into Debian and therefore it is in Ubuntu, but it always tends to lag behind a little bit. For one, we started off with having it provided because there was nothing and now we keep providing it because it's always the latest version - Debian especially and Ubuntu. We choose one version and we're just going to release bug fixes for it, we run the bleeding edge. We are always providing the latest versions for people, after some testing period, so it's a little less conservative than Debian.

There were no packages for Sphinx. I don't think we've got it accepted yet, so we had to build them. Ruby Enterprise edition - I don't think that will ever get into Debian the kind of way it is, so we provide packages for that. Just because people want to run it they don't have to run normal versions of Ruby and it makes everything really simple. What else? - New versions of monit, Basically stuff moves so quickly in our community that Debian and Ubuntu got left behind so I'm just keeping all that latest versions of stuff that people want.

   

9. Essentially a diff to Debian that allows you keep up, to stay more modern.

Yes. A more conservative Debian user might be shocked by changing monit from version 4 to version 5 in the same distro. That would rock their world! From a Rails point of view it's like "Haven't we got version 5? What's going on? Where is my Sphinx package?"

   

10. Coming back to virtualization, how do you use virtualization?

Everything that we run is virtualized. There is actually different parts of the stack that we virtualize. Starting at the bottom, our disks are virtualized. We have a RAID container with our disks in it and that's in a virtual RAID stripe, you could argue that's virtualization. On top of that we tend to put LVM, so that is literally virtualized storage. We can have the virtual, well, you might have seen them as partitions, but they are completely different and all the cloning and mirroring and things you can do with that.

Then obviously to the guest itself, which is a virtualized guest in the normal sense, but that they run over networks or in VLANs which is a virtualized networking in reality. All the way up we use virtualization. I think there is one exception, which is the firewalls. We run our own Linux based firewalls raw on the metal machines just because some of the VLAN network that we need is a little tricky with this stuff. So, there is no virtualization there.

   

11. What kind of physical machines do you have? Massive mainframes, a cluster of machines or how does it look like?

A standard rack is at the moment filled with 2 U Dell boxes. I think the latest one is the R5.10 or something. They tend to be 2.x GHz, quad cores with 2GB RAM, and we just fill a rack with those. Traditionally we would have been using 2950s from Dell, but they discontinued those. So, nothing particularly special.

They are quite beefy machines, but not cramming hundreds of thousands of guests on one machines as we'd have a problem with contention. We found a good balance with that kind of hardware.

   

12. Virtualization tends to have a bit of an overhead, particularly with IO. What's your experience with that?

We've not really had any problems. We tend to avoid centralizing everything, so we are trying to decentralize as much as possible. The benefit being that we're scaling horizontally, we spread all our say MySQL instances, we spread them out evenly, we don't just put them all on one machine. It balances out quite well. We've not really being hit with any problems where we've gone "This is virtualization causing this".

Sometimes, there are fingers pointed and it goes like "This might be a virtualization problem." But it rarely ever is. It's so much more you can do in terms of tuning inside the guest and outside the guest. It's not actually the virtualization that's the overhead. The majority of Rails projects tend to be like RAM bound, they just need RAM. They are probably not using much disk I/O at all. If you look at a typical website, you got some assets being uploaded that are written to disk, they're read once and stuck in a CDN in our case. If they are CPU bound, the overhead can get at about 96% of the performance of a normal CPU.

In most cases you've got access to your RAM, there is not really much overhead there because you've got some IO hardware assistance on that. CPUs may be 4-5% ahead and you find that they are the places where Rails would be squeezing. You just don't notice it. With disk IO at the higher end there is more to be noticed because of the decentralized way we do things, it has not really been a problem.

   

13. You mentioned that you could find other ways of tuning applications that users point at virtualization for the problems there. What would be some of the problems? What are some of the bad patterns that we see?

We will manage MySQL systems, which run inside virtualized guests. When we run them we do a lot tuning to make sure they behave in a sane way. MySQL is pretty complicated if some customers would be running their own things, it's easy to make some mistakes. Usual ones are some of the RAM pool. Say you are using InnoDB; the pool settings are the obvious stuff that goes amiss. You don't notice it when you're small; you notice it when everything gets busy.

Usually it's just some obvious stuff that wasn't an issue when you were small and when it's got big they become a problem. But the things like disk schedulers and stuff, deciding how to best for things to move up the disk queue can have a big impact on how you perceive the system to be slow. In MySQL in particular, some of the default disk schedulers that you'll use you'll end up with writes starving out reads so we tune our scheduler to try and get good balance.

For our MySQL hosted box, they're in the Dom0 are slightly different to how things are elsewhere. We run our own firewall and we are making sure you are not running your own firewall inside your virtual machine because it's overhead to do connection tracking and so on and so forth. Some of the TCP settings and stuff you got in Nginx - nothing comes obvious to my mind.

   

14. That’s a lot of machines, a lot of virtual images flying about. How do you handle that? How do you tie it all together? Is it manual?

It was manual when we first started, that was hard work. We built what started out as a simple Rail app, just to keep track of all our resources. Resources being the physical machines, our SANs which we had at the time, which networks we had, IP addresses, all that kind of stuff was in here.

Our system's called Honcho. And that’s grown now and it's still the same Rails app and it has all the information, it knows about all the guests, IPs they have, where they are, who they belong to and how big they should be.

But we have all our back-end systems use ActiveResource to interact with Honcho to reboot things or make sure stuff's booted alright, shuts stuff down when people cancel things, that kind of stuff. Honcho is like the brain of everything at the moment.

   

15. Do you use Puppet to manage user systems?

We've been using that for 1 year and a half, a couple of years, maybe. It really transformed the way we did things. We jumped through some hoops with Capistrano to do basically what Puppet can do for you. But it's a lot harder work with Capistrano the way the tasks are defined and how the task has information available to it about what machine it's running what. It just doesn’t lend itself to it, so we spent a lot of time basically writing scripts and having them installed on virtual machines, that could be called by Capistrano, which is essentially what Puppet's doing for you.

We moved to Puppet and we pretty much use it for all our managed systems, how we manage everything for customers. When a customer buys a cluster of us, say a couple of Nginx frontends, maybe 5 or 6 application boxes and a couple of MySQL. Each layer of that stack we're managing with Puppet nowadays or at least the initial install, anyway. But we also use it on the backend to manage our Xen machines as well.

Some people use PXE boot systems that bootstrap physical machines preconfigured. We tend to actually preinstall them in CentOS and then use Puppet to customize them exactly as well. When we make changes to our systems we need to apply it to all at once, we don't want to have to go reboot them all. It works well, we really like Puppet.

   

16. This brings us to the talk that you gave yesterday here at the Scottish Ruby Conference about how the Ruby world or the world can improve its operations by looking into the past or into the present of Linux tools. What was the point of your talk that you were trying to make?

My history has started out in the Unix world, well Linux, and then I came into Ruby on Rails. Actually I learnt Ruby in order to do Rails and obviously I've gone off and use Ruby for all kinds of Unix stuff as well. But from that point of view, it is interesting to me that people who've come from the other side, that are not sysadmins and they've learnt Rails and Ruby and they are now having to deploy on Linux systems. They don't have a background in it.

We noticed time and time again people are repeating the same mistakes, but more they are engineering new things to solve problems that we in the Unix world solved decades ago and are being improving literally for decades in some cases. I've been mentally been putting together for quite a while all kinds of scenarios where you don't need to write this in Ruby on Rails because this is available for you in Unix, it will do it for you. We're trying to bring the Unix world to the Ruby developer. I rushed through a lot of stuff mainly to let people know these things are available.

The details are important. It’s like "Do you know the kernel can do this for you. Do you know the file system works like this?" You can go and figure it out later. I had some really good feedback from it, so it seems to be quite interesting. I did a similar talk from the other point of view where you speak to the Unix people and say "Do you know you can use Capistrano to speak to 100 machines at once? Do you know you can use Puppet to do this kind of management?" They think "Well, this is fascinating, to bring these worlds together." They should speak more; they don't go to the same parties.

   

18. You mentioned that you did the talk the other way around, essentially teaching Unix people about how things are done in the Ruby world. Is there anything that Unix people can learn from the Ruby world? Any tools that are not available?

Yes, for sure. I think Capistrano is really a classic example of where you got lots of boxes and you manage them all. They are all pretty much identical and you need to do something on them all. Capistrano handles doing that in interacting with those machines really well. I wasn't aware of any tools that do what Capistrano does, outside of the Ruby world. I've come across a couple of since, but they are nowhere near as mature as Capistrano has become. I'd definitely recommend things like Capistrano.

Puppet and Chef are both Ruby things and the way they work is very different to how the Unix world tends to work. Certainly I feel how Debian and Ubuntu distros or most distros work is very much we build packages to do things. I agree with that to a large extent that there are some cases where that's not how it works and Puppet and Chef solve problems really well. I think they should be very interesting to the Unix people, they were very interesting to us.

Brightbox has got a bit of a rescue center for corporate sysadmins. We’ve got these corporate sysadmins who’ve been working for ages and with Unix they know what they are doing but there are some big corporations. They were underutilized and fed up. We bring them and we feed them and we give them a nice warm blanket and a nice warm Capistrano config and they love it. It's great.

   

19. What can those old Unix graybeards teach the Ruby world? Some of the examples, some of the tools, eg for monitoring or others.

Monitoring is a good example. The usual case is (or it used to be) you'd have a pack of Mongrel processes running, maybe say you've got 6 of those. You know you got a bug somewhere in your app, maybe you've got a leak in Ruby and eventually they start using 300-400 MB of RAM each and that's obviously a problem that is going to wreck your machine.

Traditionally we run monit which is a daemon which wakes up every 30 seconds and knows about there's 6 of these Mongrels are supposed to be running, and knows they should really be using 150 MB of RAM, and if any of them are using too much, maybe I should run the Stop script."

Then run the Start script and "OH, it didn't respond to the STOP script. What shall I do?" The worst case being it wakes up every 30 seconds and if in that 30 second period one of those programs uses 200 GB of RAM, monit is not going to help you. It turns out that we have monitoring systems.

It's not really monitoring, but the equivalent thing in the kernel. When your Mongrel process says "I'd like a GB of RAM please for something, I need to iterate over a huge number of rows I got from the database accidentally".

Before we start Mongrel we can tell the kernel to say "Never let this Mongrel use more than 300MB of RAM. It should never be given more than 300MB of RAM." Instead of waking up every 30 seconds, and polling and seeing what's like, the moment that the Mongrel process says to the kernel "Can I have a 301MB of RAM, please?" the kernel says "No". It's a little harsh, because there is not much you can do to rescue from that situation without a lot of hard work.

You tend to have a dead Mongrel but that's better than all 6 of your Mongrels dead and your filesystem corrupt and your whole machine hung. They're called resource limits. It's just dead easy to use, it's nothing special. I think any Linux since forever has had it - it's a command that you can run before you start your Mongrels and you got that protection built in. That works out really well.

You are probably using it in combination with monit because obviously it's not going to help you if something stops and doesn't start, but it certainly stops you overloading a system. Some people have come across limits when they were out of file handles, too many open files. It's just another example of the limits.

It just happens that there is a limit there by default, there is no limit on the RAM you can use. Some other examples - the obvious ones: the cron Daemon everybody knows about already. There is a cron daemon running, if you want to have something, a Rake that runs every year 10 minutes or something, you write a config and you ask your cron daemon to do it for you and the cron daemon runs your script. I don't think it really ever occurs to anyone to write they're own cron daemon in Ruby.

Everyone knows the cron daemon is there, but there are some other examples like the at daemon which is pretty much the same thing, but instead of running things regularly, you can run a one off. You can say "Run this script once at 3 a.m., please" and it will do the handle for you so you don't have to worry about remembering to do it or to add a cron tab and then remember to remove it.

But it’s more advanced if you can say "Run these scripts when the system is not very busy". The at is installed on fewer machines than cron but it’s not a lot of work, 'apt-get install atd' and you can literally straight away start scheduling tasks.

   

20. You mentioned using SMTP for messaging. How does that work?

That might come across as a little wacky but I really like it. Everyone is getting excited about Rabbit MQ for queuing for reliable messaging between systems. I like to point out that we already have reliable messaging between systems and we have had it for ages and you probably have it on your machine, which is the SMTP daemon. It solves different problems than Rabbit, but there are a lot of cases where it can be used. We use it for a lot of distributed MySQL boxes all over the place.

They are logging the information about slow queries. We aggregate all that and put it into a control panel so our customers have got a nice interface to how they can improve their queries. We used to have a log shipping thing that had all these SSH keys all the place and stuff getting copied and so on.

We have to be careful to make sure that these run at the right time and we did it afterwards and so on and get the ordering correct. What we use now is an SMTP network between them and we use the logrotate daemon which is something else I mentioned in the talk. At the end of every day, the logs are rotated and the log from the day is emailed to an address.

A number of systems send that email and deliver into a mail directory. Then we have a Rake importer task, which looks in that mail directory and parses them, because they are just log files in the directory and it parses them and deletes them as it imports.

That works out really well, because it’s reliable for a start, so there is a local mail server on the MySQL boxes, so that the logrotate doesn’t have to worry about making sure that the logs get where they are going. They just hand it off to the mail daemon. The mail daemon will try to send it to where it’s supposed to go and if it can’t, it will try again later.

It eventually gets to where it’s going. We got load balancing built in as well. We have these MX records for a number of these machines are going to receive these logs and the MySQL mail daemons will pick one of them at random because they are all the same priority. If one of them is down, it will try another one.

If they are all down, it will try again later. We get load balancing built in, it’s reliable and secure the way we use it with the TLS and things, too. It’s really neat and it’s just there for us to use without really much overhead. We’ve been using it for decades, we’ve got SMTP sorted, and we know how it works.

If you got a proper mail daemon and a good RAID you shouldn’t ever lose any mail, you should never get any duplicates. There is no polling involved, we’re not using IMAP or a POP daemon to go "Oh, have you got any more of these?" We’re just looking in the files in the directory. It works out really well.

I wouldn't use it for everything. These logs are not big and I wouldn’t probably be emailing 100 GB of logs around, but it works pretty well in this situation.

Aug 03, 2010

BT