Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews Luke Marsden on ZFS and the Docker Ecosystem

Luke Marsden on ZFS and the Docker Ecosystem


1. [...] Luke can you start out by just telling us a bit about yourself and your background?

Chris's full question: Hi. This is Chris Swan, one of the Cloud editors at InfoQ. I am here at QCon London with Luke Marsden. He has just done a presentation on Flocker and Powerstrip and some of the other things he does at ClusterHQ. I am going to try not to ask him too many questions about that. So, Luke can you start out by just telling us a bit about yourself and your background?

Sure. Thanks for having me on the show, Chris. I guess this all gets turned into a show of some description. I started out getting involved in infrastructure in 2001 when I started a small shared web hosting company with a friend of mine, and that sort of paid for the beers as I went through university, and it also taught me some of the perils of running production systems. So, we actually had our data center just here, a couple of miles from here, in London, but I was studying at Oxford and so the pager would go off in the middle of the night and I would have to go in the middle of the night and I’d have to go swap out a failing disk from a machine or come and restart a MySQL database in the middle of the night – and that was never fun. So, I was sort of exposed to some of these production problems early on and one of the things that I looked at in my degree was doing concurrency across a distributed system. So, when I finished my degree I was kind of inspired to try and solve some of these practical problems we had in the hosting company and came up with an architecture where you would have a set of machines in a cluster, there would be no single master and the system would automatically reorganize itself. So, if there was a failure, another machine would take over. If one machine got too busy, we would move some of its containers onto other machines. I said containers and that is kind of interesting because back in 2008, we were starting to work on solving these data problems for containers. Back in 2008 it was FreeBSD Jails, but a lot of that same technology is translated through to the work we are doing in the Docker ecosystem now.


2. Has Docker really been a timely arrival on the scene now that containers have become mainstream? What challenges have you had adapting from a world of BSD to a world of Linux?

Well, most of the code ported across quite nicely because it was Python and the ZFS on Linux port is looking really healthy. We certainly had some problems with BSD’s ZFS port early on which is why we hired Andriy Gapon who wrote most of that code to fix it for us. But no, in general, there are a different set of workloads being run in Docker containers compared to the shared hosting workloads that we were doing before, and that has actually led to some nice simplifications we have been able to make. So, the way that networking works for containers, as you know, Chris, it really just relies on TCP and so you are able to say, in a Dockerfile, for example, “Expose port 80” and then when you run a linked container, you just say “Link me to this other container on this specific port”. So, whereas before, we did this crazy layer 7 proxy that was call Awesome proxy, where we had to have HTTP, SMTP, MySQL binary protocol, FTP, SSH – we did IMAP and POP, as well. So, we had this crazy set of protocols that we had to support, but moving into the Docker universe, it is actually much simpler now and you can deal with TCP and ports. So that is nice.

Chris: So Docker has allowed you to simplify your world by concentrating just on storage and leaving network to somebody else. On the subject of storage - you touched on ZFS already. Not everybody will be familiar with ZFS. So, tell us more about what that is and why it is better than some of the other alternatives.

Sure. So, I think of ZFS as a really beautiful storage analog to containers for compute and the reason I say that is that containers give you light-weight portable compute and ZFS gives you light-weight portable volumes. So you can allocate hundreds or thousands or even tens of thousands of volumes per ZFS pool and therefore, by slicing up a Z pool into lots of little pieces, you get these file systems that can be allocated at the same sort of densities that we are seeing people getting excited about for containers. So, thousands or tens of thousands of containers per Node. The other benefit is that these file systems are then portable, they are independently snap-shotable and they can be replicated between machines. So, you can take these thin slices and then easily, seamlessly, move them around between different hosts in the cluster. And that is pretty powerful.


3. ZFS and DTrace always seemed to me to be two of the things that kind of came out of Sun that were notably better than some of engineering that could be seen elsewhere. Why has it taken so long for ZFS to make its way over to Linux?

ZFS took a while to come to Linux because of some licensing concerns, which are basically resolved now. We have had an IP lawyer look at the issues around that and we are pretty comfortable with that now. But that, let’s say, spread a lot of FUD around ZFS as a file system. It has never really been in doubt that the Solaris engineers who came up with ZFS were very good and continue to be very good and OpenZFS is actually a thriving project. But because it was open-source in terms of the CDDL, it has to be shipped separately to the main line Linux kernel. So, there has definitely been some slow down there caused both by the patent issues and also the licensing issues. We are seeing it being deployed in a very wide scale now, across a lot of very big organizations, using it to manage petabytes of data. So, we’re comfortable with it.


4. [...] Is ZFS going to become a tier one supported file system for Docker?

Chris' full question: For your average Linux user, what is now involved in getting ZFS onto their machines so they can start doing things with Flocker? We’ve have being talking in the Docker track today about other options like OverlayFS and AUFS and device mapper and all of those things. Is ZFS going to become a tier one supported file system for Docker?

Yes, I believe it will. Although there is an important distinction to draw there between the Overlay filesystems which are used for managing images and ZFS which, in the Flocker case, is used for managing volumes. So, images get instantiated into the root file system of a container that consists of multiple layers whereas volumes get mounted separately at specific mount points within that container. They are really two different types of things. So it makes sense for OverlayFS or AUFS or whatever happens in that world to carry on getting better and we sure know that it does need to get better, and ZFS to be used in parallel for managing and orchestrating the statefull workloads and the statefull volumes.


5. To be clear on that, you see a world where you are kind of booting your container with its root in OverlayFS, but then using portable volumes on ZFS through Flocker or whatever other means?

Let’s call out the pragmatic viewpoint. I think it is likely that that will be the case. My idealist, the engineer in me wants to see a pure ZFS on root Linux distro which gives you ZFS both for the images, which is plausible, and also for the volumes.


6. OK. So what does it take to make that happen?

Well, we need to get Brandon Phillips and convince him ZFS is safe and good. Actually CoreOS might be quite amenable to that now because they recently reverted from BTRFS to EXT4 for their root file system because they had some issues in production there. But in all seriousness I think that, as people start wanting to move their statefull workloads into Docker and they discover that Flocker exists and it allows them to do that, they are going to just use it. It’s going to be less about which file system we are using, or why we are using this file system, and more about that fact that we need to solve this business problem. And I think as we see containers start to hit the mainstream and people really just demand for solutions to their problems, then I think we will get ZFS into more computers that way.


7. You touched earlier on using Python. Is ClusterHQ a Python shop or you whatever tool it takes shop?

Funny story – we actually have most of the, or a large proportion of the Twisted core team on our staff. So we are very agile and efficient when it comes to developing things using Twisted which is Python networking framework. So, long that may continue, it is an excellent set of abstractions on top of network programming, which is hard to get right. That said, we are also hiring and we are looking for both Go developers and Python developers, and Go is clearly an important part of this ecosystem. And as we start pushing some of the work that we are doing in Powerstrip etc. into Docker itself, it will become increasingly important that we get that sort of talent on board as well. So, if anyone is in the South-West, and are a Python or Go programmer, come and check out our web site.


8. So you just touched upon your sort of home base in Bristol. What is it like doing a tech start up outside of one of the main hubs?

It is fantastic. Bristol is a beautiful city, you can easily get to the countryside, you can go for bike rides, you can cycle to work. And, it’s got a thriving tech ecosystem, and so we are really proud to be one of the most exciting companies that has come out of the region and that is really helping us attract a lot of great talent, from not only Bristol, but also from the wider region, all the way from Birmingham to Cardiff, Chippenham and so on. Yes, it is a really great place to build a business. Actually, a huge “Thank you!” to SETsquared which is the organization that helps us scale it.


9. Much of what is going on in the Docker ecosystem is happening in the Bay area? How do you manage that?

I am averaging about a week a month out there, at the moment, and I seem to have become immune to jet lag, so I guess that is how I am managing it.

Chris: So that is going to be very helpful.

So, we also hired a CEO out there who is Mark Davis. He is excellent. And we are growing a team out there as well. But, I am going to continue commuting back and forth just as much as I need to and it’s fantastic. Every time I go out there, we build more relationships, and it is going stronger.


10. Do you see the day coming where with so many startups, you would grow up and you would move to the Bay area?

I think it is not necessarily the right thing to do – to move the whole organization to the Bay area. I see a lot of value in having a strong engineering organization that’s outside of the region of the San Francisco rent prices, and the intense competition for talent there. It’s not to say that we are not going to hire out there – we are. But it definitely makes sense to have a substantial engineering team somewhere else on the planet, and there are beautiful places like Bristol to do it. So why not?


11. [...] Do you see more of that happening?

Chris' full question: It feels like we got an amazing mini-Docker ecosystem growing here in the UK. Not just London. The rest of the UK broadly. The Orchard lab guys with Fig acquired by Docker now but becoming sort of the heart of the Docker’s office here in London and Docker compose and the Weave project and Weave works to build that out and of course ClusterHQ. So, do you see more of that happening?

Definitely. It’s funny. On Tuesday night we were at the Docker London meet-up and I think there was over 300 people out there. It was the largest Docker meet up ever, anywhere on the planet, and I think that goes a long way to show that London and the UK more generally can be, if not the second most important city for the Docker ecosystem, possibly even the most important. So, yes! It is great to see that activity on our door step.


12. [...] What else do you think people are going to want to use extensions for?

Chris' full question: When you talk about extensions - and you have built a prototyping framework for that with Powerstrip - the obvious use of extensions are for network and storage and clearly you have an answer to that with Flocker for storage and the Weave guys have got an answer to that with Weave for networking. What else do you think people are going to want to use extensions for?

I see a lot of people interested in Powerstrip for policy and in particular, being able to control what people are able to run. For example, bind mounting a version of the Docker socket that allows a container only a read-only view of the universe. It can do GET requests, but it can’t do any POSTs. That is definitely a class of API specific extensions which could be very valuable, and then another huge area is security. I have been speaking to quite a lot of people who are interested in this. I mean, it is sort of related – access control and security. But there’s lots of people interested in building security products around Docker and having that sort of fine grained control over the API calls and it is important for that.

Chris: Cool. It has been great to have you here, Luke. Thank you very much for stopping by.

Thanks, Chris. Awesome. Let’s go and have a beer.

Mar 26, 2015