Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Fireside Chat w/ Docker CTO

Fireside Chat w/ Docker CTO



Justin Cormack discusses the present and future of cloud tech, ebpf, isolation, kernel improvements, and more.


Justin Cormack is the CTO at Docker, working on unikernels.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.


Schuster: To continue the language discussion from before. We talked about ahead-of-time compilation and just-in-time compilation. Another dimension of this is what about garbage collection versus everything else, reference counting or manual memory management. Do you see that having an impact on cloud native languages?

Cormack: Yes, it's an interesting one, because I think really, until Rust came along, everyone was in the basic view that garbage collection was the right thing to do. It was so much easier, solved all the problems, the performance issues were over. There's a huge amount of engineering gone into garbage collection over the years, because it makes it so much easier for the user. It's a research topic, and as an engineering topic. Java did a huge amount of work and a lot of other languages did a lot of work on making garbage collection fast. It's quite a well understood problem in many ways. For most applications, it's not really an issue. Then Rust came along, and just said, we're not going to do it like that at all. We're not going to do garbage collection, we're going to be different, and came up with, there's linear types piece, and some escapes from it, like going back to reference counting. I suppose, actually, the Apple ecosystem was the other big holdout, because Swift and Objective-C, we'll just do reference counting, it's good enough for everyone. It's much simpler and more predictable performance than garbage collection, and it matters on the phone. We're going to introduce that everywhere else as well. That was the other holdout.

Linear types originally came from work on trying to compile Haskell more efficiently and functional programming languages. It was always one of the things that was going to make functional programming much more efficient, because you could tell not about garbage collection, but about reusing the same memory location for a new thing. Because in principle, in functional programming, you declare a new variable for everything because you don't mutate and splay. The naive implementation is to use a different memory location for each time you change something, which generally is not very efficient. The linearity thing then was all about, can I guarantee that if this thing is only used once, I can use the same memory location because you can't have a reference to its previous state so I know I can update it. It was really about update in place linearity. It didn't grow as a miracle in compiling functional languages that was kind of hoped, I think when they wrote the original papers. There's some really fun papers on linearity. There's one guy who wrote just totally bonkers papers on it, which are just incredibly funny. I'll have to dig it out because it was just like, these papers are fun, in the early 2000s, late '90s.

It didn't create quite the revolution there, but Rust approached it, took the same thing and tried to solve that problem, like if we know that something is only used once, again, we don't have to put it on the heap, we can just update in place, keep control. Then, if there's only one thing that writes to it, then we can have read only copies temporarily, and so on. That was pretty radical as a break from the path that programming languages had been taking. I think it was very radical in terms of people thinking about how to write programs with it initially. It took time before it could even recognize all the cases where there was linearity, and so on. The later implementations were more forgiving, but it does make you think about allocation differently. It's mental cognitive overhead having to think about allocation at all, in a sense, but also in a sense, if you care about performance, you've actually always thought about allocation. If you go to the high performance Java talks that we have at QCon, it's like don't allocate. The first rule is hot loop, never do any allocation at all. If there's no allocation, there's no garbage collection, so it's not a problem. It's always really been the case that people have thought about allocation for things where performance matters. For throw away code or things that are not performance critical, then it's so convenient.


The other approach that Zig took more recently has been interesting too, because it came along after Rust, as a related niche. It's obviously much less well known than Rust. It's a niche that's more as a replacement to C than for cases where you would definitely have to use C now. It's very much designed around interoperability in C. They had a different model where rather than enforcing linearity, you have to specifically pass around an allocator to every type thing that could allocate, which makes it even more visible. You know of a function that allocates memory, because you have to pass it an allocator, but you can do things like do arena allocation where you pass an area of memory, this function allocates in that and afterwards, you clean it all out by releasing the whole arena. Which is actually similar to garbage collection type strategies that a lot of people use anyway. That's where the arena term came from was garbage collect. That's a different approach again.

Schuster: The Zig reference counter, do you have to manually free stuff?

Cormack: You have to manually free in Zig, but you can deallocate the whole allocator and create a whole arena. You can do a halfway stage where you can lose track of stuff, because you allocate a gig of memory, everything allocating to this. If it's failed to free it, doesn't matter because you're going to free up the whole thing, for example. You don't have to keep track of things as effectively, but you have to understand and create that strategy yourself. Normally, the default would be to have a C like thing where you have to free things and it'll leak if you don't, which is bad, but it lets you create these strategies yourself by making the allocator first class.

Zig also does other things with like have failing allocators, which most languages don't work well in the presence of running out of memory, but in Zig, because you passed an allocator, everything you pass an allocator can fail. The allocator has a fail exit case, so everything has to catch failed allocation error. You can also do hard memory bound code effectively, which has actually been really difficult. Code that adapts to the amount of memory that's available has been something that's really hard to do. Linux does not encourage that either by just normally having lax allocation anyway and overcommit, as you turn it off. There's a lot of cases where you're in a runtime, you really want to know if anything's failing, rather than just the standard crash, when I run out of memory, being able to catch that.

For the right kind of low level coding, it's really important. Most code actually isn't as resilient to running out of memory as it should be. That's often something you can use to cause denial of service attacks by passing gigantic parameters to things and watching them crash because they run out of memory and things like that. Or if they don't have backpressure, you can just accumulate memory and not fail correctly and feedback the backpressure and things like that. I think for resilience, explicitness in memory allocation is important as well. I think there's been a new era since Rust of experimentations in different models, which is really valuable to find out what works, what doesn't work. What kinds of cases does it matter? What kinds of things can we build if we do something different? How easy is it to learn how to do something like linear types, which was an academic construct for Haskell? Actually, it came from math before that.

Languages That Emulate the Rust Approach

Schuster: It's definitely a big change from Rust, because I never thought we would have another manually memory managed language or a language that doesn't use garbage collection, but here it is. Languages that also use the Rust approach, so you mentioned Zig.

Cormack: Zig doesn't use the Rust approach.

Schuster: Not GC, yes.

Cormack: Yes, it doesn't have GC. Other than Swift, those are the only three I can think of, that are doing that. I think it takes time to learn how these things work, and I think we might well see more.

C++ and Reference Counting

Schuster: I was wondering, every time I tried to poke fun at C++ developers for having a manually managed system, they always tell me, no, in modern C++, everything's reference counted, smart pointers.

Cormack: Yes. It's not enforced. It is used mainly as a reference counted language. C++ has been very adaptable from that point of view. If you see it as a way of adding new idioms and an evolving language without really removing everything, then, yes, you can definitely argue it's in the reference counted piece. I don't know if they'll go as far as linearity and things as well. Some people will call reference counting garbage collection. It's just a very simple form of garbage collection that doesn't work well with cycles, but is incredibly efficient to implement. Reference counting is very old. I think the original garbage collection implementations in the '60s were reference counting, because it's so simple. You can argue that that's more a continuation of GC. Even Rust has support for reference counting for all the use cases that linearity doesn't work.

Schuster: It's interesting with Rust, you get the type system to help you to make sure that you have the cheap reference counting operations if you don't share it across thread boundaries, for instance, which is exciting. I find that you can use reference counting, but you don't pay the atomic operation overhead.

Cormack: Yes.

Performance, Scalability, Cache Locality, and Dealing with Cycles

Schuster: We can talk more about languages, but we have here performance, scalability, cache locality, dealing with cycles. Yes.

Cormack: I do find it just interesting there are these tradeoffs that come back. It's always been the case that the systems programmers, particularly the C programmers just said that garbage collection would never work for x use case. For many x use cases, that's not been true. There are some use cases in kernel. Even then with unikernels, we have OCaml and things using garbage collection in the kernel. It's more niche when you go down to that. As I said, for people who're doing performance, they understand how the deallocations/allocations are taking place, regardless of what the runtime does, and they're still doing the same things. Implicitly, you can do these things by hand, and not allocate and know that things are going to be allocated on the stack in your language and all the rest of it. Tracing and decompiling and seeing what's going on in the runtime, and you can always build something equivalent by just not using these features.

Memory Overhead Cost of GC versus Reference Counting

Schuster: The one question I always have, and I haven't found any good research on this, maybe that's out there is, what is the memory overhead cost of GC versus reference counting? The answer isn't that simple, because more than reference counting is becoming a bit lazy as well, so you don't immediately deallocate stuff. There might still be some overhead.

Cormack: The worst case for garbage collection is, theoretically, twice as much on a really poor implementation of copying, but that's not how it works. It's invariably less than that, but it does depend a lot on your workload. Most GC has arenas for things that last for different amounts of time, most of the stuff is deallocated very quickly after it's allocated. It doesn't hang around for very long, it's quite efficiently removed. It's almost like stuck lifetime, and then you get a little bit longer, maybe it crosses a function boundary, and then gets deallocated. Then you've got stuff that lasts the lifetime of your program probably, and often stays, but maybe it changes slowly or something. It is very application dependent. There is a bunch of research on memory usage. I suspect, the memory usage has probably gone down since the research was done, the GCs have just continued to get more efficient in terms of things like that. It's probably worth, and your application might be different. Few people rewrite things and after compare as well.

Schuster: That's the problem. Yes.

The Cognitive Load of Memory Management

Cognitive load of memory management, doing it by hand, try to debug why it's not working right or quickly, and frontload the cognitive load [inaudible 00:19:31].

Cormack: I think it's like the real backlash we've seen against dynamic typing for the last few years. We are going to, ahead-of-time, let's put things in the type system, let's have more of those guarantees upfront period, again. From having a, let's do things at runtime period, that was the dynamic language boom, the JIT compiler boom, and now we've gone back to the ahead-of-time old world in a sense, how things used to be because predictability is important. Predictability is a really interesting thing. If you want predictable latency rather than the lowest latency and so on, if you want to make sure that your largest latency is low, then doing things ahead of time and not doing GC just removes those tail latencies for when things can go wrong. For those types of applications, it makes a lot of sense. The developer cognitive load and debug load is interesting. I think, as you said, that one of the interesting things about linear types was it went into the type system, not somewhere else. You could use familiar tools, to some extent. It still behaved differently, and extending the type system does aid cognitive load. That's what a lot of people say about things like Idris that tries to do a lot more in the type system. It does become very different. It's definitely that ahead of time. Once you've got it right, you know it's done, kind of thing, rather than having to then profile it and see if it worked.

Extended Berkeley Packet Filter (eBPF)

Schuster: Can we switch gears from languages a bit lower level? A few years ago at QCon London, you gave a talk called eBPF is the Amazon Lambda of the kernel. Did that work out?

Cormack: It's interesting. I think there's a lot going on. I think, actually, that community is still just growing and expanding. There's actually been a lot of new things. There was an eBPF foundation set up in the Linux Foundation, Microsoft is implementing it for Windows. There's big things. Linux stuff takes a long time, because a lot of people use quite old Linux kernels like Red Hat LTS and things. It's only recently that the pieces of eBPF other than the profiling pieces have become really widely available. It's still not the easiest ecosystem to program, because it's low level and weird, and most people are programming in C still. It's still on the growing stage. I'm seeing a lot more people interested in adopting it. A lot of companies doing things with it. It's got really strong niches around things like networking. The first thing that happened with it was the performance stuff that was done mainly at Netflix, originally, then a lot of networking, because it turned out that manipulating packets was a really good use case. For now, things like security, it's growing, making more complex security decisions about whether something should be allowed, which needs more context. Yes, it's getting there. It's still growing, taking time, expanding the tooling and so on, on that adoption curve.

eBPF in the Future

Schuster: What are the things that are being added in the future or that might expand it even more? What can we expect nowadays with eBPF, what things are stable?

Cormack: The handling of kernel interfaces is getting much easier. There was a whole revision of how the compiler works to make it easier to not have to compile a separate eBPF binary for every kernel version you wanted to run in, and to link it up at load time. That just made it annoying to just distribute code except in source code. I think that'll make it remove a lot of barriers, because it was just annoying to just distribute binaries to people that would work. That's probably the biggest piece is that ease of use piece on distribution.

eBPF vs. Sidecars

Schuster: I don't know if you have an opinion on this. There was a discussion recently that was pitching eBPF versus sidecars, from an efficiency point of view, capability point of view. Is there anything to say there?

Cormack: I think that a lot of people are now using a combination, basically, eBPF if you can, because it's much faster, and running stuff in the sidecar if you can, because it's difficult. You haven't written the code yet, or it's a more complicated thing, or the kinds of manipulations that you want to do are more complicated. The other things with the networking side that hasn't quite taken off yet is the use of in kernel, TLS, which was going to make it easier. Because if the kernel does the encryption, then it means that the eBPF code can access the data before it's encrypted, so it can inspect more to make processing decisions before the encryption happens potentially. There's a whole lot of optimizations you can do. If two connections are actually on the same host, you just loop them back through localhost, don't bother to encrypt and decrypt, or don't packetize them, and don't run TCP over a local connection and things like that as well, which make for a lot of efficiency.

The sidecar model is simpler. The idea that you can take some bit of code and then control how it handles the outside world completely, in some code that you don't have to deploy at the same time with that code is really powerful. I think there's a lot more we could do there. With Docker Desktop, we manipulate a lot of the network traffic that's coming out of containers before it reaches the host, because we're doing it across a VM boundary, so we get that opportunity. If you could do it for a local process to effectively containerize it without it realizing, route its traffic differently, make security decisions locally, all those things as a control wrapper around some application you want to run, it's an extension of the container model, in that sense.

Sidecars were always pitched originally as being a replacement for dynamically linked libraries. They're dynamically linked bits of program that manipulate the outputs, which are generally like network streams and so on. If you can do this even more generally, like you can take all of the state that the application deals with, with Linux, and manipulate all of that and control all of that and monitor all of that, then that's a really powerful way of thinking about containerization in that sense, and virtualizing, and just modifying how things work.

One of the things that Cilium does is it intercepts DNS requests using eBPF for access control. If you're only supposed to access, it intercepts the DNS request, looks at the response, if you're asking for a address, and it comes back with an IP address that lets you call towards that IP address. If you ask for DNS lookup, it'll just come back, and it won't return you the IP address, so your application can't actually do anything. Then it configures the allowed and not allowed routes that it's going to let you have based on those DNS requests. That model of getting more semantics around how the operating system can control you, and making it totally transparent without you having to have a DNS library or work out which DNS server that it's even talking to, you know that an application you have control of can only resolve names through DNS. The hard coded IPs, you can just block them anyway, because it hasn't done a DNS request. It's a really interesting way of adding that semantic control around what things can do.

Things like access control and enforcing TLS are some of the things that sidecars were initially asked to do, because getting application developers to build in all their authorization access control framework that your organization wants is really hard, and linking to a shared library didn't work once people started using multiple different languages. We don't have that common linkage yet. Although, again, WebAssembly is looking at common library type things that could be multi-language, but we're not really quite there yet.

The Security of eBPF Code

Schuster: I saw there was some concerns about the security of eBPF code or injecting its code that's essentially running in the kernel. What's the current thinking about that?

Cormack: I know we talked about sandboxing before. It needs to run in a sandbox. How good is the sandbox? The security issues in eBPF have been around changes that have been made to extend it when the model hasn't been. The changes in the model generally have caused issues. Linux decided not to formally verify eBPF, but I think it's an interesting research target. Most of the issues have been found so far, quite quickly. It wasn't introduced as a best practice way that we have of designing sandboxes. It was a little bit more ad hoc and there's been a number of issues. Particularly as the allowed constructor got more complicated, because initially it was really simple, and it was based on the original BPF for networking, but more features were added. Those features turned out to have security issues. It's good to plan how you're going to implement sandboxes because they will break and they do cause issues, and it's just good to get those right.

Interesting Projects in the CNCF

Schuster: Justin you're involved with the CNCF.

Cormack: Yes.

Schuster: Are there any projects that you find particularly interesting there that people should look into them?

Cormack: We have a lot of projects now. The thing we've done really over the last year or so is to really open up the sandbox to enormous number of experiments and projects that may or may not make it. I've lost track. I think there are 40 odd projects in the incubating and graduated kind of mature projects, and there are a lot of sandbox projects. There's a core of projects connected directly to enhancing Kubernetes, and extending it in different ways. There's a lot of interesting ones there. We've got things like Dapper that just came in recently, which is what Microsoft's using for their new container runtime on Azure. It's been open sourced, but it's now a CNCF project. We've got things like Crossplane, which lets you run services as if they were native Kubernetes objects. I think it's really interesting.

In the sandbox, we've got all sorts of things. We've got Wasm runtimes, confidential computing, key management encryption, lots of projects in Rust. We've got Wasm for Kubernetes from Microsoft. We've got all sorts of things. It's a big experiment in things that may or may not work out, and we're giving them the space to explore. We've given people visibility of these are things that people want to collaborate across companies, across their organizations, and they want to share them with other people. They don't want to own the whole direction of the project, they want to make it open. We're seeing projects go in quite early stages to there sometimes. We usually require that they've been around for a little bit, but if people are already starting to work together, and seeing multiple people coming in, then that's a good point at which we allow projects in. It's really a lot of fun things, a lot of experiments.

I think there's a huge number of projects around different aspects of edge computing, and that's where a lot of the Wasm things fall in, and that kind of thing. No one knows quite what the shape of that stuff is going to look like. There's a lot of different approaches. We have several different projects extending Kubernetes to the edge in different architectures, in different ways. That's great, because we don't know what the shape of things is going to look like. We have people that are really excited about WebAssembly and the kinds of applications people might build with that. I think we're still exploring the shape of what that will look like, and doing it in the open. We're the big community of people who can really drive projects to success.

Was The Unikernel Killed by Docker and Containers?

Schuster: Did Docker and containers kill the unikernel, or is the unikernel still around?

Cormack: There's a bunch of people still working on unikernels. They don't go away because the ideas are really exciting and they're much easier to implement than they used to be. I think the commercial models around unikernels have been really difficult, how to get mass adoption rather than niche adoption. It's hard. What are you going to offer developers that's radically different? I think that some of the things are related to pieces around WebAssembly and lightweight applications at the edge that people are starting to get interested in again. I think a lot of the ideas will come back again over the next few years as people explore those areas in more detail. Because smaller, lightweight and customized and specialized applications for single purposes for smaller, less resource hungry areas is an area that people are really interested in again now. We start building things that are big again, and we've got these reasons for thinking about small again. That's where the interest still comes up a lot, still talk to people about things. People are still working on it, but it's more hidden.

Schuster: It's definitely exciting, maybe long-term.


See more presentations with transcripts


Recorded at:

Sep 16, 2022