BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations LinuxKit

LinuxKit

Bookmarks
49:55

Summary

Avi Deitcher talks about LinuxKit, its history and purpose, and how it differs radically from the familiar operating system distributions. He delves into LinuxKit's design and architecture and explores how LinuxKit offers new ways of operating, plugging operating systems as first-class citizens directly into deployment pipelines.

Bio

Avi Deitcher is Managing Consultant at Atomic. He has been an engineer and businessman for over 20 years, designing and implementing technology, strategy and operations. He uses his time helping clients implement technology solutions that fundamentally change how they operate, and invests time in building and contributing to open-source that has the potential to affect how enterprises operate.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Deitcher: Why Composition Isn't Just For Music. And for what it's worth, I'm terrible when it comes to making any music; I have tried my hand at it. Let's just set some goals and some standards here to start. I'm not going to dive too far into how this thing works deep down under the covers; some of the OS tracks later today are. The goal here really is, what is this thing, a little bit of how it works, and at the end of what's about 40 minutes or so, give or take - speaking of which, I didn't start my timer, so I'm going to do that so I'll be sure I don't run over on you - at the end of this, you should be able to say, "Oh look, I can build a runnable image in 90 seconds flat. Fantastic!” We're going to stick on the practical side.

First survey. If I don't see hands on this, I'm definitely in the wrong conference. I assume everybody here in the room, put your hand up if you've ever written or maintained software. Oh good. Otherwise, I'm really in the wrong place.

Let's take it a step further. Version 1.1. Who here has written or maintained, what we used to call proprietary or commercial, now we call closed-source software? So pretty much everybody in the room. Now, hopefully, no hands go up for this. Who here would be thrilled to pieces if their proprietary or closed-source software were suddenly exposed and open and nobody had to pay for it?

Well, unless, of course, you really hate your employer, and I know a few people have been like that. But we're going to stay away from those.

I'm going to share a story about how I got involved with LinuxKit, and therefore what it means, what it is, and how it works.

A few years ago, I'd say about four years ago, give or take, a friend of mine named Andy, who ironically was doing his doctorate in engineering when I was doing my undergrad, so he actually taught me, he said he had a business. For about 20 years, he'd been in the business of taking files of all sorts of weird proprietary formats, sometimes on tape, or optical media, or sometimes on drives and really weird places, and converting them into fairly standard formats. These are all audio and video. A lot of that is. Why? Well, you're running a company and you're the chief legal officer. You've been recording, I don't know, your online chats for five years between you and your customers, or your calls into the conference center, and lo and behold, you get sued. So fantastic. For discovery, we need to produce everything over the last three years. Great, we'll go pull it off tape. Shoot, it's some weird proprietary format. The tape itself, everything's encrypted in a strange way. How do we get it out? There's a whole niche industry of people pulling weird, commercial, odd proprietary formats out and converting them. So he is essentially the “Little Converter That Could”. A nice little machine that will take these tapes and convert them over. It's great.

To some degree, it's a consulting business, usually pay per tape, per file, per whatever it is, but to some degree, you have no leverage. Now, there's a lot of financials here in London. Leverage is your ability to boost whatever returns you're getting. In the consulting business, you can only sell as many hours as you have: 2,000 - 2,500, you're going to kill yourself like a New York lawyer, 3,500 hours a year, whatever rate you're going to make. But he wanted to get some leverage out of this.

So the way you get leverage is things operate themselves and you grow it. There are two ways you can go about doing that. One, write it as SaaS, put it in the cloud. Fantastic. You put it in the cloud, people upload their files, they convert, they pull down, it's great. It didn't really work. It doesn't really work for two reasons. One is those tapes that you saw, they are absolutely massive. The amount of data to be uploaded is huge, and it isn't so easy to get them up. You have to find some kind of reader to read these weird odd tapes, or optical media, or whatever.

The other is that many of these companies, like your typical financials here, go, "Our tapes are sitting right over here. This is my premise; your cloud is over there. You may not remove these. These are the only copies I've got. I've got compliance reasons. No, we're not removing this stuff." So SaaS didn't work. So the other way to make things self-service is, well, stick an appliance on premise. If you're such a big on-premise place, we'll stick an appliance.

That kind of creates two problems. Remember that this is a niche industry. This niche industry, it's not Cisco. I can't deploy billions of dollars or pounds worth of equipment on site. And by the way, pay $50 million a year to my lawyers, make sure you don't try to reverse engineer my stuff. By the way, Cisco, I'm not picking on Cisco, just an example, Cisco, you're paying as much for the hardware as for the software.

So, the first problem is how do I protect 20 years of engineering from two weeks of reverse engineering? I happen to have seen his code. It's brilliant. But I could reverse engineer it in a few weeks if I had access to the source, and then I don't need to pay you. The second is once I'm deploying an appliance, I'm kind of limited by what I've deployed. It's very expensive to re-deploy, so optimal running is really important. I can't just say, "Well, we're going to take a larger cloud instance, or take another instance." It doesn't work that way. How do I minimize the overhead so the deployment lasts? I'm locked into this device for as long as possible because I really don't want to go out there and re-deploy.

Well, it turns out you're not very special. This is a problem everybody has in applications and especially in operating systems. How do I optimize my environment and how do I secure it? It's pretty straightforward.

Well, if you're not special, at least I like to think I am, so I'm allowed to take 10 seconds and tell you who I actually am. I spent about 10 years doing mission-critical IT, mostly in financials. I spent a lot of time right here in London, both in the City and Canary Wharf, and 12 plus years doing consulting with a bunch of startups in the middle. For what it's worth, I'm always an engineer at heart. That was the first thing I actually owned and wrote on. I don't remember any 6502 assembly, but I did use to write in it.

I love ice hockey. I missed a game last night to be here. But if anybody here is into ice hockey, definitely talk to me. Most people don't know, I believe the largest ice rink facility in the world is in England; if I remember correctly, eight rinks in one place. You'd think it would be in Canada, but it's not. And I love great engineering, but it has to matter. Cool engineering, I will spend 10 minutes on, an hour. Engineering that can change how things operate for businesses or people, you've got me.

Evolution

This is my evolution of involvement with what we're talking about today. Well, we needed to solve Andy's problem, and we needed to solve lots of parts of it, one of which is how do I keep you from getting into my operating system? I've installed this appliance on your premises. I don't trust you, so we start by doing things manually. But of course, that ends up being a lot of repetitive work. What's the old line? A great engineer is lazy. They hate to do the same thing twice.

I ended up working with Packer. Well, lots of people use Packer to build things. Packer is great. I owe a deep debt of gratitude to Hashi Corp for it. The problem is, it is slow. It launches an OS, does all these installs, builds things according to your scripts or your definition file, and it takes 25 minutes to deploy. Kick it off, walk off, get a nice mug of tea, because I think I'm one of the few people who grew up in North America who doesn't like coffee. Get a nice mug of tea, you come back and 23 minutes out of the 25, it blows up. "Oh, look, I think I found it. I fixed the problem." Kick it off again, 23 minutes later.

For most people here who've written software, how many iterations does it take to fix even the most minor bugs? Five, 10, get it right. If your cycle time is half an hour, you're dead. Just a waste of time. So I said, "Well, I can do this better." The old line with the five most dangerous words in engineering, "How hard could it be?" And I said, well, it's hard to tell, but it's actually a Dockerfile, “I can do this using OS images, and kind of build the bits that make an operating system together." So I actually got pretty far with it, but I was doing a lot of custom stuff. Hot rods are great custom cars. They're a lot of fun. If all I want to do is get from here to here, I really don't want to maintain a custom car. So, I was struggling with this what to do, and I ended up in the city, who knows what that is and where it is?

Participant 1: Berlin.

Deitcher: Yes, Berlin where I met Justin. He says, "Let me tell you about this project we're working on. It will be open source at one point." And that was LinuxKit. And that was essentially doing what this did, but in a much, much better way. That would be the end of the story, except Justin discovered that if enough cups of tea and hot chocolate are bought for me, I can be convinced to actually maintain this thing. And so I have been for the last few years, not as an employee of Docker, but as an independent.

LinuxKit

What is LinuxKit? Well, it says here, I can read it to you, a toolkit for building secure, portable and lean operating systems for containers. Yes, yes. What it is, is it's a series of pieces of Lego. I love Lego, by the way. I have been playing with it since I was a very little kid, and I continue to. It's a way to build exactly what you need into a bootable, runnable operating system image. Why? What's wrong with existing distros? Nothing's wrong with them per se. But if I look at an operating system, it kind of looks like this. I don't remember if this is Amsterdam or Copenhagen, it could be either or, but it's got all these great things. It's got a bus, and boats, and people, and buildings, and I'm thrilled, it's got everything I want. I could always add a few buildings with a few quick commands like APK, or APT or YUM, or whatever. But if all I really want is that bus, this is a hell of a lot of things to get, just to get that little bus, if that's all I need. So what is LinuxKit?

It's a very easy way to build runnable, disposable, and immutable images. See how much luck I have, people know what this is?

Participant 2: Ayers Rock.

Deitcher: Good. The last time I presented this I did a few practice runs. People were close. Somebody got Arizona. I'm very glad people got Ayers Rock. Ironically, I've been to Australia twice for an extended period. Neither time did I get there. I have to get there.

So what? I can build these images. So what about it? Yes, I will get to building them live on stage and hope that the demo gods don't come after me. First of all, you get to the size. You get, relatively speaking, a very small image. It has exactly what you need in it and no more. That actually matters when you're pushing bits around. Your start time is, relatively speaking, faster. Why? You can't do much about the kernel itself. It's going to take however long it takes. But if everything else that starts up afterward until the service I really care about, that Nginx, that Tomcat, that whatever it is, that Go program, runs, my start time is faster and that very much affects things in the world where things are disposable, starting and disappearing quickly.

Your cycle time is faster. When I did stuff on LinuxKit, I'd kick off a build. I can't even start to get my cup of tea, and boom, it's done. "There's an error, I'll fix it." Five minutes later, I've done 10 iterations, 20 iterations, I've fixed my thing and I've moved on.

Debuggability. Your app is your app. But when you have complex issues to debug, if I have this, and this and this, and this, and this, and this running on my operating system, it's a pain in the rear to figure out what the heck is going on that's stealing resources. I have dealt with this many times. On the other hand, if I am running just the one thing I care about or one in two, it's much easier to find my problems and debug them. Performance. You want to optimize what you're running. I used to call this checklist hell; list of things to run through to make sure that I have optimized, disabled, removed, done all the other things that are running on my operating system. Forget it. If all I got is what I care about, the rest becomes, relatively speaking, easier.

And last but not least, of course, is security. What are my attack surfaces? The more I have running, the more my attack surface is. Have I remembered to remove everything? Do I add this or remove this? If all I've got is what I care about, that's all I have to worry about.

One side point, every picture you'll see here, and I love pictures on presentations, is either Commons licensed for reuse, or I took on my own. This conference, by the way, is very good at it and it's been emphasized many times. This is the only one that I actually got an explicit license for. I just thought that was such a great emblem for security. And I reached out to them, saying, "I'd really love to use your logo." I fully expected that a highly secretive government agency, there's no way they'll say, yes. Three hours later, I got an email from somebody named Christopher saying, "Go ahead and use it. As long as you don't use it in a negative way or imply that we're endorsing your product, go for it." So I have to thank the GCHQ for that. It was very unexpected.

And finally, let's put it in its context. Many people here probably know the various products that are involved, probably know what this is, but when I did some practice runs, people suggested, "Let's make some awareness of what this is." People probably know what containers are. Okay, they're legal fiction, but theoretically, they're a way of isolating processes, applications that are running on an operating system. Images are ways to package them up. containerd is a runtime, sort of it's a runtime daemon, but its used along with runc as a runtime to run those containers. Moby is the open source project to collect - I'm going to get this wrong, and if there's a Docker engineer there in which there is, please correct me - but they're a way to collect the various open source components that you can use to compose runtimes. And Docker, of course, is the company that started a lot of this. It is also the company behind LinuxKit, so thanks to them. And there is LinuxKit.

Use It

So what does it take to use LinuxKit? What do I need to do to actually build myself lightweight, composable, runnable, immutable operating systems? Well, there are two parts to it. First, there is a single command, it is Go compiled to single command, not lots to install. This is the LinuxKit help. It turns out that there's all of eight options, which is not all that much to start with. Well, one of them is help and one of them is version. We're down to six. serve and run actually runs it somewhere, those are pretty much test time options. I'm not going to use that in production. I'm not using that to build anything. push it is pushing it to a store, pkg has to do with package building, metadata is information. So guess what, there's only one command that really matters, “LinuxKit build”.

By the way, I used to go like this all the time and then I'd say, "Yes, look. You can make a red line. It'll run for you." So you've got one binary that has one command that really matters, and you've got a config file that tells it what to install. I'm going to dig into this a little bit later, depending on time, but when it comes down to is this all of about six, seven sections in a file that tells you how to build an operating system. That's fantastic.

Formats

What are the formats that it can output? Well, if you want to run an operating system, you need an image that can be run in a cloud or locally, has just about every supported format you'd like, ISO BIOS and EFI, just the kernel and initrd and ISO, squashfs, qcow, raw, disk image, TAR, blah, blah, blah, specialized versions for AWS and GCP, which are really just standard versions with some special requirements they have to run things. If you want to run it, you can run it. We've done stuff on Pi, right? Deployed images run the Raspberry Pi and makes you happy. And we have support on just about every architecture. It's popularly used.

How long does it actually take to build since I spoke about cycle time? So I ran this twice on my home network. The first time, it took 1 minute 35 seconds. You'll notice that it's pulling down lots of images from places. So that probably could have been faster if my home network were just a little bit faster. My home network itself is probably not the problem since I'm running lots of unifi hardware because somebody in this room, somehow they convinced me to spend a lot of money on it. When I re-ran it, when everything's already cached, it took me 44 seconds to build a standard image. That was it. That is fast. I cannot get the water that hot, the boiler so I can make myself a cup of tea that quickly.

Demo

Let me see if I can actually get a demo up and running here. I have to exit this and then go there. Here we go. I definitely need to make this larger. Is this readable or should I make it somewhat larger? Make it bigger, okay. I can't even see what I've got here. A little more sane? I took the standard examples. This is the standard example to see right in the repo if you clone the LinuxKit repo, https://github.com/linuxkit/linuxkit, that's later in the presentation. This is the standard one. It's also the one I put up earlier for you to see. It's probably easier if I “vi” it because it would color highlight it. But it basically just starts you up an operating system image and gets your Nginx running. That's all it really does.

If I want to build that, which I shall, I should have run that with time, we'll kick it off again. I would guess about 35 to 40 seconds, and this thing will actually build a runnable image, and I will run it too. The steps that you see here, I will actually, time depending, take some time to run through what the various components are. And go to his talk later, by the way. It's supposed to be fantastic.

Participant 2: I noticed that there is a technical [inaudible 00:17:43] on what capabilities you give the instance. So, how do you figure those out?

Deitcher: You mean this part right over here? It depends what it actually needs. Under the covers, and I'm going to get to that, under the hood. The Americanism for a car engine is under the hood. Is there a different term in English, in British English? Where do you keep the engine in a car?

Participant 3: Under the bonnet.

Deitcher: Under the bonnet. I would have changed the slide. I will change it before I upload it. You actually need to know what your application needs. In this case, it's the same problem that you'd have if you ran it in Kubernetes or in Docker on your desktop or whatever is in compose or in Swarm, may it rest in peace. So it's the same basic question that you'd have there. It actually built, but I'll run it again but with time because that would have been smart of me. Actually, as I said, I cheated. I downloaded everything in this image, all the images before because I didn't want to get stuck on the Wi-Fi network here. Look, we're stuck, it doesn't work. It'll do the same thing again, so it will just take about 45 seconds and then we'll run through some of the pieces that are in here. What it's basically doing is taking the various components, the raw materials it needs to make that operating system image and putting it together. And, yes, you can do it too.

It's just creating our outputs.

What did it take? 33.981 seconds. I didn't notice the .081 difference. I've also got here, let's see, that one was the standard LinuxKit one. I've also got one with Redis on it and WireGuard. I'm happy to build those two if we'd like to see - here we go. Much nicer with the colors on it. It's pretty similar. I said I'll dig into the details later, except I've got WireGuard installed on this. The WireGuard people have been very active with LinuxKit from early on, and we appreciate it. But you'll see that building an image that doesn't have Nginx on it, it's just got WireGuard on it, also takes about 35 seconds. And yes, I cheated. This morning I downloaded all the components that go into it, all the images.

I forgot in my notes, it was while this thing is building, take a drink of water. By the way, those signatures, ignore it. That has to do with how the images are signed and the trust part. I don't have time to go into it today, and I have to be perfectly honest, every time I go into it, I get myself confused. But it has to do with the signing on the images. You actually have the capability of saying, "I only want to trust these signed images."

Under the Hood

I'll pop back into the slide presentation. Okay, good. It worked. So under the bonnet, under the bonnet, right. I didn't even put the word hood on it, that's fantastic. This, of course, is a Ferrari. If I looked for something that's really fast and capable, a Ferrari came up. I thought about doing a Lamborghini, but I've always found a Ferrari to be a much more beautiful car. I had a neighbor, last time I lived in New York, who was a podiatrist so didn't deal with insurance, all cash business. And he inherited his New York City apartment from his father. He was single, without kids, well into his 30s. So what he did for fun is he had leases on a Lamborghini and a Ferrari. I have a hard time saying that's what we do for fun, but I had a very good time going in those cars. He once let me sit in the Lamborghini's driver's seat. I was, "Wow."

We'll look at four pieces here: the engine systems, the components that make it up, the driving controls, and we'll dig into those a little more deeply, and the manufacturing process. So engine systems. Look, when you run any operating system, you're going to have a kernel which is going to kick off in init. That happens here as well. It's no different. What happens here, however, is you have three additional pieces. You have the “onboot”, which are things that start and run one time and then exit, that you want to do to set things up at the beginning. You have the “services” that you really care about. These are things that run long term, your Tomcat, your compiled Go process, your Node process, your Nginx, your whatever it is, Spring Boot, and your “onshutdown”, things that happen at the end.

Now, a few things to note, besides the fact that I apparently capitalized parallel but not sequential. One is your “onboot” processes are sequential, one, the next, the next, the next, the next, it does it in order. Those are not meant to be long-running, those are meant to go forward and end. A lot of operating systems that boot like that can get confusing as to what's long-running and what's not. This keeps it very clean. When you shut down, assuming you shut down cleanly, you have the same. The things you really care about are these services. I'm running one Nginx, one Nginx and two Tomcats, one Nginx, two Tomcats, three compiled Go processes, and Node, and Ruby, and whatever else.

The second is to note that all these are running as containers. When it says on that GitHub page for containers, this is what it means. Your “onboot” are all run via runc, so while you're “onshutdown”s and your services are running containerd. Essentially, it means that they're run the same as if you'd run Docker on them or close enough. This actually leads to a very small, well-contained operating system with not a lot shared between.

So what are the components? You'll see, by the way, I tried to stick to the color scheme here so that as these re-appear throughout the 10, 15 minutes we have left, I'll try to stick to the same scheme. You have these five pieces that go into this magic hat that then runs through LinuxKit build, and it gives you a bunch of outputs in whatever format you like. You have the “kernel”, you have one of those. You have the “init”; the init are, well, it is your init process, but it's also anything that has to be deployed on the base of the image itself because you need it there for various reasons. I'll show you examples in a moment.

You have, as I mentioned, the “onboot”. You have the “services”, and you have things that shut down, again, assuming you're shut down cleanly. Obviously, if the thing blows up, you don't get to run your “onshutdown”s. The “kernel” is a single OCI image, or as we used to call them, Docker images. That's where it expects to find the format. It can pull it from any registry or locally, but that's what it is. So the kernel is packaged up along with any of its modules, drivers, anything you need, into an OCI image. Your “init” is one or more OCI images. As a matter of fact, it will always have to be at least three to run correctly. You'll need the init itself, and you'll need the runc, and the containerd in order to actually run your “onboot”, “onshutdown” and your “services”. Yes, that's why I put exactly three there.

Your “onboot” is one or more things run sequentially. Also, guess what? OCI images. A few years back, I was sitting with Bryan Cantrill who is the CTO of Joyent and who, by the way, is a very entertaining speaker. Watch any of his stuff. You may agree or disagree, but it's a lot of fun. He had this line to me, this was early in Docker’s days, where he said, "Docker is to yum, what yum was to tar, what tar was to copying files." That is essentially a fantastic packaging format. I package everything up exactly how I want it, add lots of metadata and everything's great.

To some degree, that's what we're doing here. We're taking a really good packaging format and saying, "What if we just built operating systems from it by composing the bits just the right way, and making a wonderful symphony?" Your “services” are also OCI images. Now, that gets really easy because, guess what, most of the services we like to run nowadays, for the most part, are already available as Docker images somewhere. It's supposed to say OCI images, and your “onshutdown”s are images as well.

What are your driving controls? That stuff is supposed to be in the background. Go figure. I had to stick with the Ferrari, and I made a specific point of taking the Ferrari with the wheel on the right-hand side. I'm going to dig a little bit into this file, but after everything we've done the last five minutes, it should be fairly straightforward. First, you've got your “kernel”, that is the grey color. Your kernel is an OCI image. This here is LinuxKit kernel, that means, by default, it's going to the Docker Hub. Yes, as part of the LinuxKit project, we actually package up the basic things you're really going to need, including a kernel. This is kernel 4.19.20. It's hard to see here because of the way that Keynote decided to do this. I'll flip it away for a sec. But you need a command line as well. Whenever you boot, you've always got it, but you usually have a command line on it.

Next is your “init”. These init images are also images. You don't necessarily have to hash. We have versions on most of these. You need your init, your kernel’s done. It will look for a specific init process, binary to run is in it, or really a process to run isn't it? runc, containerd and certificates. There isn't a whole heck of a lot here. We need runc and containerd to run”onboot”, “onshutdown” and “services”, and the certificates are obviously useful in order to get anything.

Your “onboot” are two, as we said before, sequential things. Your “onshutdown” as well. I really don't remember who put this in. It's been there for a while, but somebody clearly likes – someone can place this - "So Long, and Thanks for all the Fish." Douglas Adams. Yes, not a lot of people read it nowadays. It is a great series. Little too whimsical, but it's great. And last but not least, your “services”, which are the things that you really care about, (there's no straight back one element) which are your services. Here I'm running getty because if you want to log in at all, you need getty or sshd. You need something to get in - or telnetd, but I don't think anybody runs that nowadays - You need something to get in. I highly recommend when you run in production, don't use “insecure=true”. That's a nice little thing that says, “don't ask for a password” when you log in on the console. If you do that, and you get breached, don't say I told you to do it, but I'm telling you not to do it. It's not my fault.

One of the vendors downstairs, as I walked out of the keynote, at the very end of it, had one of the screens with the rotating signs, one of the security vendors, and it said, "Don't be the next Equifax." It's great. Here's one, two, three services you're running: getty, rngd, and nginx. It's going to run nginx. The pair that's over there. These are Docker images. They're running in containerd, so if you need them to have certain capabilities, you have to explicitly give it to them. And last, we're putting “resolv.conf” there so it can run.

Finally, if you want to copy files over at build time, just like you might do, for example, with the Dockerfile, you explicitly can tell it, "I'd like to put these files in." There's a whole series of ways you can put things in. I don't want to get into it too deeply, is we put in the base of the system. I'm not going to the trust, but it has to do with Docker trust and images. And last but not least, the manufacturing process. Do a quick check of time. Good, doing great. I'd ask if people recognize where this is from, but it kind of gives it away with the tail.

Participant: Boeing.

Yeah. Has anybody visited that plant? This is fantastic. “The Future of Flight”. I think it's called “The Future of Flight” or “Future of Flight” tour in - is it Everett? Is that the town? - Everett. It's about half-hour north of downtown Seattle. It is fabulous. It is so much fun for people, especially if you like engineering. This is just awesome. You actually kind of go, if it's not this side, it's the other side, they take you through the tunnels underneath and to the top. You see the production line for the 777, 787, I think the 747. They have two lines for the 787, so one of the two lines for it. It is tons of fun. It really is. I highly recommend it.

In any case, this is, if people don't know the 787, and I don't know if they retrofit the 777, it was the first line they did like a normal assembly line. So then they'd have a place where I'm building a 747, and they'd bring all the parts into it and build it. I don't remember if they retrofitted the 777. 787, you can see it's going along like a normal production line that we have images of Ford plant or Aston Martin cars moving along the line and things being added. This one right here is getting the wings attached and the landing gear which is why it's being held up by this, these blue things, the gantries I guess, blue things. You can see already here, it doesn't have it because it can stand on its own wheels, or maybe it's the next one. It is really cool. I recommend it. I get to geek out for a minute.

So the manufacturing of LinuxKit is a lot of fun too, not quite as fun as building a 787, but it requires a lot less capital. So this is the LinuxKit build I did before. I kind of cut off how long it took, but it's about the same. We'll run through those same steps. Your net result here is obviously to build a file or series of files in the format you asked for.

As an intermediate step, it's going to give you a directory on your local, wherever it is you're building, your laptop, in the cloud, wherever it is, whatever instance you're running it on, and it's going to put everything in the exact format that it needs it in the final image. So this here is my intermediate directory. The first thing it will do is it'll take that kernel image, extract it into slash and obviously /boot/kernel because that's where things are expected. Then it adds the init containers - I was afraid it was failing right at the right moment. It'll add the init containers, it'll pull them all out and pull them all into /. Your init containers is not just the place that you can say, "How do I get my init?" it's also the place that I can say, "There are things that need to be at the root level of the operating system, not inside my services containers."

I'll add my “onboot” containers. And my “onboot” containers go into /container/onboot. “n” is the sequential number. So here you'll see this blue “000-syscontrol”, “001-”. So names can conflict, it's not an issue, but it does it sequentially, as we discussed earlier. And then adds shutdown containers with the same naming. Finally, it adds your service containers and containers services names must be unique there. And then last but not least, it takes files from your local file system and puts them in. When it's done with all of that, it takes it and gives you a tar file. I'm kind of lying, it doesn't really give you a tar file, it gives a tar stream, doesn't matter. For all intents and purposes, I've got a tar of everything that's here. I now basically have this and say, "How do I get this into the final image I care about?" and we go into finishing. I have this image of somebody detailing a Ferrari.

Before I got into electrical engineering, I actually wanted to go into aeronautical engineering. It's too dependent on a few large companies, I refuse to be bound by that.

It takes this tar, and through its own native Go code, gives you lots of different output formats. And I don't have everything here, ISOs with bios and efi boot images, bios and efi VHDs, qcow, initrd, kernel+*, you name it, it's probably there. If there's something you need and it's reasonable, open a pull request or at least an issue and we'll take a look at it.

Now, I'm kind of cheating a bit. Everything that happens in LinuxKit happens natively in that one binary, except for two things that still fork out to Docker, which means you need to have Docker installed and to some degree LinuxKit itself. We're in the process of trying to get rid of those. One of them, I'll mention it later, if I don't get to it feel free to ask, Justin is working hard to get rid of it. The other is, well, building these images is actually not so easy to do in code. There are userspace utilities for doing things. For example, if you're doing EFI, your EFI partition needs to be formatted in FAT32. Lots of great userspace utilities to do it, you're going to have to go out to do it. In going out to do it, it might not be installed. So you're going out, you need to do a “docker run”. You're going to call Docker run to go ahead and run an image that has it and do your bit conversion for you essentially.

To build the image itself, if you need EXT4, there aren't really good userspace utilities for that, so you need to mount it and copy files over. Which means we go ahead and we call Docker, and we even call LinuxKit, which sort of feels like that, what was that movie with the drugs, the dream drug, Inception, where LinuxKit build, calls LinuxKit in order to build LinuxKit images. We are in the process of trying to get rid of this, so that it literally can build anything on LinuxKit because when it comes down to it, these are just bits in a file. If you can lay them out the right way, why do I need the kernel?

So I've been working on this for well over a year to essentially have libraries in go and hopefully eventually other languages to lay all these things out entirely natively. If anybody feels like helping, I'd love to have it. If anybody feels like masochism, we'd love to have it. If anybody wants to criticize as long as it's constructive, go for it. But we're actually pretty far along.

The Future of Linuxkit

What is the future of LinuxKit? Look at the crystal ball and hope I'm right. Like I said, we want to eliminate the outside execs to Docker, to LinuxKit. We've got it in two places. That's one. The other place is where we actually pull the images down and extract them. We actually use Docker for that, even though it's in the library that Docker uses. It's in the process of being gotten rid of.

Next, greater service composition. Those services right now are completely independent. You can get them to behave in certain ways, but to get them to do things together is actually not that easy. You could say, "Hey, look, you're building this tiny, immutable, composable thing. If you have two services, build two images." That's sort of like when we say, "Hey, every container should have exactly one service." You're right… in an ideal world. The world is never ideal. There are real use cases where things need to participate nicely together. I'd like to get rid of UEFI and BIOS bootloaders, we have boot loaders in there. We'd like to go direct to booting those, again, eliminate one more piece. Kind of jumped up again yesterday or two days ago when we had problems compiling something.

Increased TPM/TPM2 support. Has anybody here worked with TPM and TCG? TPM is - it's not correct to say it's the implementation of the trusted computing group, which is really just sort of an Intel spin-off. It's a chip, it's supposed to be a chip, there are recently firmware implementations, it's supposed to be a chip to provide certain basic security functions at a hardware level, that do things like certain kinds of key management, certain kinds of encryption, the state of your system, which is what got me involved with all of this, remote attestation that nobody has actually changed your system. We have some support in here for it, for TPM, mainly 1.2, TPM 2 we'd like to get. If I remember correctly, Google Cloud has recently started providing support for TPM, so you can actually do all sorts of interesting things like safer encryption, safer key management and attestation. CoreOS in its day was doing some really great work on this. Unfortunately, it no longer exists, as a company anyways.

Remove Alpine entirely from the base image. There are still a few areas where Alpine itself is installed on the base image and, open question, compose what? We've taken this big operating system that I had to take the whole thing and remove or add lots of things to get it right, and instead ended up with a place where I can say, "I want that bit, that bit, that bit. Munge them together, compose them and make my image." Now that I've done that, what's left? What is there in there that I can compose to make interesting things? I bet Per [Buer] has some very interesting ideas on what you can compose like kernels, and now you're way out of my depth. But what else can you compose? We have some ideas, we'd love to hear more ideas, but what else can we do to make this lighter, more secure, and faster?

Summary

So to summarize, operating systems can, should, be like applications. You write software, what language?

Participant 5: Go.

Deitcher: Go, you write a Go app, you deploy it, you discover a bug and you need to add a new feature. Would you ever dream of going to your compiled app and changing out the bits in it in order to give the new feature? Hell no. You throw it out, you make your changes in your code, you recompile it. We did that to operating systems all the time. "Look, there's an issue with this library. We're just going to monkey patch what's out there." We call that “yum update” or something. But that's what we essentially do. We should not do that. Operating systems can and should be like applications. Minimal, targeted for their goal, composed, put together what you need, immutable, throw it out when you're done. I actually have seen people monkey patch applications, actual running jars or binaries; it always ends badly.

Last but not least, and this is why I like it so much, CI/CD-pipelined. I actually have the ability here to make my operating system images part of my pipeline because it's fast to build, it doesn't require all sorts of weird privileges or things like that. That actually changes how I deploy operating systems and production at a large scale, or even at a small scale. As I said earlier on, I love great engineering that changes how people operate. This, to me, is perhaps the biggest thing here. It becomes part of normal, sane software practices.

Now go forth and build these things. What about the end of the story? What happened with Andy? Well, Andy still has his appliance, and, I'm not going to get into too much detail, the way we solved our problems is there's a disk on there, of course, it's an SSD. We encrypt the disk, entire disk, and you can't boot it without decrypting it. I remember reading somewhere that the way highly secure organizations, like GCHQ or NSA work, is that they're all encrypted disks as well, and one or two approved people have to come in on boot time of a system and actually enter whatever secure passwords, or keys, or whatever is necessary.

That isn't doable here for this small company. So the encryption key is actually locked on a TPM chip, which can only be unlocked if the system is in exactly the same state as it was when it started. You can't change your firmware, you cannot change your NVRAM, you can't change anything. If the disk isn't exactly or anything is changed, it won't work. So we build it, set it, encrypt it, store the key, and send it out. And the operating system on top, of course, is LinuxKit. What's he doing with this? Well, sorry to get sort of crude, but in the end, every piece of software is essentially useful if somebody can use it. And this lovely thing over here pays for all our salaries. So we're all very happy with it. He's out there and selling it.

Thank you very much, happy to take any questions. It's been 42 minutes. We have how long? Five minutes. When that man says we stop, we stop. Ask some questions. Warning: the answer might be, “I don’t know”.

Questions & Answers

Participant 6: So if someone wanted to contribute, is there a contribution process involved?

Deitcher: Yes, we like you. Please do. Open an issue on GitHub. It's github.com. It was on the initial page, https://github.com/linuxkit/linuxkit or just Google it. We don't actually have a homepage for it other than GitHub, do we? We really should. So if you open an issue for it, I'll take it on. Go ahead, open an issue or open a PR by all means. If you can write in Go, you can do it. The packages themselves that compose those things don't have to be in Go. They're just containers. We’d absolutely love to have it.

Participant 7: This looks really great for building really immutable systems. Now, building immutable systems is in natural opposition to having remotely updated systems. So this friend of yours, who has these appliances, can he do remote updates and how does that work?

Deitcher: He and I got into a long debate about remote updates. I actually prefer immutable systems for updates because I'd rather not apply an update, but that I make a change in my config and it rebuilds down a CI/CD process and pushes it out. That's great if you're Barclays, or Morgan Stanley, or JPMorgan, or Financial Times because the beginning of the toolchain all the way down is essentially internal or at least connected in some way or another.

A lot of his customers don't have internet access for appliances that are deployed on site. There's literally no way to get out there remotely and do it. So that ends up being another push for him to do something like this where he said, "I'm going to have my kernel and my drivers and the bare things I need in it because that means I have far fewer things I need to update." Once he actually needs to update, he has to go on-premise. There's no choice. Let me rephrase that. I believe there are one or two customers who are sort of allowed to remote update, but that was after a long back and forth. And we're essentially living with, “Sorry, no updates”. In a larger scale environment, Justin was in North Carolina, was it a few months back, I believe, so he was in North Carolina and heard of one company using it large scale - I don't remember what it was for.

Justin: Run their GitLab.

Deitcher: Run their GitLab. In that environment they'd say, "Make it same with the software as part of your CI/CD pipeline, make the OS part of it.” And then you don't need remote updates, you just need a, "Look, I changed this, it deploys it out." I make that sound a lot easier than it is.

Participant 8: Yes. But is there a fundamental conflict between building systems with little escape and having systems being remote updatable?

Deitcher: Depends on how you define “remote updatable”. Do you mean updatable in place? Then to some degree, I would say yes. I'm sort of thinking all the variants through right now. To some degree, I would say yes, but if you can make it part of your pipeline, then the answer is it's not remote updatable, it's replaceable. We definitely had discussions around the CoreOS/Chrome/whatever A/B-style deployment updates. They never really went very far. I think there was...

Justin Comack (track host): There are people working on it.

Deitcher: There are people working on it?

Cormack: Yes, there are people working on that. Yes, putting in place updates. Yes.

Deitcher: I stand corrected. There are people working on it. I'd like to be part of that.

Participant 10: Can you give a sense of the relative sizes of those pieces you got, like the kernel, the init, and your Nginx, the services? Like percentage wise, roughly, how do they compare? Is the kernel a big piece of that?

Deitcher: What percentage of?

Participant 10: The overall image size. The goal is to make these things really small, right? So I guess where I'm going from is if you're going to skim down even more, would you be better off trimming down the kernel, or would you be better off trimming down, say, your Nginx in your server? Do you know what I mean?

Deitcher: Okay, that really depends. Nginx is a great example. But in your production deployment, you might use Nginx, you might have your own custom code that you've written. You might have a 1.5 gigabyte massive service, it's possible, and you might have a 10 meg or 5 meg service. The kernel itself is whatever size the kernel is. I'm happy to bring it up if you want. Do we have time for that? I'll do it afterwards. I don't want to steal everybody's time.

This is a principle in general, I would love to see the kernel itself be far more modular. I don't just necessarily mean in things like unikernels where there are people here in the room who really know that well. I would love to see things like, and I may be fantasizing here, I would love to see things like, "Look, to run my services here, I want a traditional kernel, but I only need this 30% of it. So can I please build, compose, compile a kernel that's only 30% without having to go through the whole kernel compilation process?" What LinuxKit did, I mean, this to say, "I only need an OS that only needs 5% of everything that it's got. I don't want to go in and monkey around with it. I just want to be able to compose it." I'd love to see that. Will it happen? I'm not getting involved with Linux maintainers, kernel maintainers at all. It's not my world.

Participant 10: That's where I was coming from exactly. I just wondered is it worth it, because if the kernel is only a relatively small part of most of these images, then who cares? But that's kind of, yes.

Deitcher: As soon as we're done, I'm happy to pop open and look at what the actual size of the image is. I don't actually know, and we're going to have to look in the image and see that versus, say, all the modules and drivers and everything else in there. John likes that part. I love this. I can see people's names. This is the first conference I've been to where the badges are big. And they're high enough I don't have to look at somebody's stomach. I love that. I stand up there and somebody says "Hi, nice to meet you." Okay, I can lose a few pounds. Please stop looking at my stomach.

Participant 11: One of the targets for finishing was AWS. Is that AMI that's produced them?

Deitcher: Yes.

Participant 11: How do you do that? Do you actually have to connect to AWS and no, it's not like packet?

Deitcher: No, you don't have to connect to AWS. AWS, essentially, it's a BIOS...it's not UEFI, is it? It's a BIOS bootable image. They have certain specific requirements about the size or multiples of size of the image itself, it's EXT4 image BIOS bootable, I believe, but I could be remembering this incorrectly because it's been about a year since I dug too deeply into that part. But they have certain odd requirements so if it's not certain image multiple sizes, they'll just go, "Sorry, no, we won't boot this."

Cormack: But basically, there is an API that can convert an S3 bucket with a disk image into an AMI, which is what we use. So you don't actually have to run anything. There's no mounting your volume, joining all of that.

Deitcher: You said it's not like a packet?

Participant 11: Packer.

Deitcher: Packer, sorry, I thought you meant packet. I was running through their API in my head.

Participant 12: Hi. Yes, thank you very much for the entertaining talk. I'm just wondering about the kernel image. I'm assuming it's probably got a lot of kernel modules bundled in with it. Is there any mechanism at the moment for shaking out modules you don't necessarily need for the services you're running, because it feels like relatively easy pickings rather than actually taking a small subsection accounts, just dropping a load of the kernel modules that are potentially bundled, or have you just got to create your own kernel image with just the modules that you need?

Deitcher: You have a few choices with that. The one that was there that said the Linux kernel, whatever it is, are the ones that we deployed mostly raw, for the ones that we deployed, he's one of the maintainers. Where is Rolf? Amazon now I think, left Docker about a year ago, six months, but is one they maintain for ease of use. "Look, it's there, use it, go ahead and do it." And he does a great job with it. It's just a Docker image. I could easily take it, do “FROM” and remove a whole bunch of things that I don't want in it, or build the pieces that I want in it. Most of the challenges we get are on the other side of it, which is what happens when I'm running in on bare metal and I need this specialized driver and that specialized driver. In the cloud, it's a little bit easier. Things sort of look the same more or less in terms of what it looks like from a hardware perspective. In bare metal, it kind of gets harder. So you definitely can.

I'm getting the stop sign, I love that, stop sign at the back of the room. So I will stop here because I don't want to run over. Thank you all very much. Hit me up afterwards with questions with pleasure, and I will be around and definitely attend the rest of the OS track talks today. They are fantastic.

 

See more presentations with transcripts

 

Recorded at:

May 19, 2019

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT