InfoQ Homepage Podcasts Bryan Cantrill on Rust and Why He Feels It’s The Biggest Change in Systems Development in His Career

Architecture & Design

Bryan Cantrill on Rust and Why He Feels It’s The Biggest Change in Systems Development in His Career

Apr 12, 2019

Bryan Cantrill is the CTO of Joyent and well known for the development of DTrace at Sun Microsystems. Today on the podcast, Bryan discusses with Wes Reisz a bit about the origins of DTrace and then spends the rest of the time discussing why he feels Rust is the “biggest development in systems development in his career.” The podcast wraps with a bit about why Bryan feels we should be rewriting parts of the operating system in Rust.

Key Takeaways

DTrace came down to a desire to use Dynamic Program Text Modification to instrument running systems (much like debuggers do) and has its origins to when Bryan was an undergraduate.
When a programming language delivers something to you, it takes it from you in the runtime. The classic example of this is garbage collection. The programming language gives you the ability to use memory dynamically without thinking of how the memory is stored in the system, but then it’s going to exact a runtime cost.
One of the issues with C is that it just doesn’t compose well. You can’t just necessarily pull a library off the Internet and use it well. Everyone’s C is laden with some many idiosyncrasies on how it’s used and the contract on how memory is used.
Ownership is statically tracking who owns the structure. It’s ownership and the absence of GC that allows you to address the composability issues found in C.
It’s really easy in C to have integer overflow which leads to memory safety issues that can be exploited by an attacker. Rust makes this pretty much impossible because it’s very good at how it determines how you use signed vs unsigned types.
You don’t want people solving the same problems over and over again. You want composability. You want abstractions. What you don’t want is where you’ve removed so much developer friction that you develop code that is riddled with problems. For example, it slows a developer down to force them to run a linter, but it results in better artifacts. Rust effective builds a lot of that linter checking into the memory management/type checking system.
While there’s some learning curve to Rust. It’s not that bad if you realize there are several core concepts you need to understand to understand Rust. Rust is one of those languages that you really need to learn in a structured way. Sit down with a book and learn it.
Rust struggles when you have objects that are multiply owned (such as a Doubly Linked List). It’s because it doesn’t know who owns what. While Rust supports unsafe operations, you should resist the temptation to develop with a lot of unsafe operations if you want the benefits of what Rust offers developers.
Firmware is a great spot for growing Rust development in a process of replacing bits of what we think of as the operating system.

Subscribe on:

Show Notes

Where did DTrace come from?

02:10 It came from my frustration with not understanding what software was doing.
02:15 On the one hand, it was so long ago; the initial ideas came when I was an undergraduate around 1995.
02:25 Even then, it felt like the stack of abstraction was so deep that I didn’t understand what was happening.
02:30 It was very hard to reason about software from first principles.
02:35 That was a much simpler time - if you look at now, things are much more complicated.
02:40 Software has this problem: when it’s actually running, you can’t actually see it.
02:45 In particular, I didn’t understand why we weren’t using dynamic text modification to instrument running systems, which is what debuggers have used for a long time.
02:55 Debuggers have mechanisms where you set a breakpoint, you change the program text to break into the operating system.
03:05 The program stops, you go into the debugger, print out whatever state you want and when you run it, it will single-step past the stop instruction.
03:15 I didn’t understand why we didn’t do something more intelligent using similar technology but allow things to run where they would record some amount of state but not otherwise stop the process.
03:30 I wanted to do that for a long time, and I came out to interview for Sun - reluctantly; I really wanted to work for a computer company, which in 1996 every computer company was mortgaging its future to Windows.
03:55 I wasn’t really interested in Windows, but although Bill Gates is curing tuberculosis or whatever, he still robbed me of my childhood by depriving me from memory management.
04:05 I can’t get past it: I apologise for it, he’s done a laudable work for humanity, and it’s probably more than made up for the fact that we didn’t have a protected mode operating system for all of the 1980s.

I heard that now DTrace is available on Windows?

04:20 DTrace is available on Windows - this is true.
04:30 I’m not sure how it makes me feel - it gives me complicated feelings; I don’t know what I feel.
04:35 Generally, it’s very positive; it’s great for Windows - Windows itself has long since become a modern operating system.
04:40 There’s a lot of interesting technology being developed at Microsoft.
04:50 On the one hand, it should be unequivocally great; on the other hand, there is something that is a little bit weird about it.
04:55 I’m glad that it’s the power of open-source software, that they were able to take the technology and make it part of Windows to hopefully a larger group of people.

Tell me about the interview with Sun.

05:20 My experiences of Sun up until then had been pretty mixed, up until I met the engineers in the Solaris performance group.
05:25 In particular, Jeff Bonwick - Jeff and I were like a bolt of lightening.
05:35 You had a very energetic person who saw things as I did.
05:40 I had been asking people why something like DTrace didn’t exist.
05:50 What I had been told by someone who I looked to as an expert said “There must be some reason it can’t be done, because if it could be done it would be done by now.”
06:00 On the one hand, it’s a reasonable thing to say, but on the other it’s an unreasonable thing because it implies we have invented everything that can be invented, which is obviously wrong.
06:20 Sometimes really great ideas can start with these basic questions.
06:30 So when I came out to Sun, my question to Jeff was not “Is this possible?” but “Why haven’t you guys done this?”
06:40 What I remember was Jeff’s reaction, which was “Yeah, that seems like it should work - you should come to Sun and do that.”
06:55 That was very empowering, and I have tried to take that with me through my entire career that a 21 year old has the power to have an idea that could be really important, and they should be encouraged to pursue those ideas.
07:15 Even if the ideas don’t bear fruit, the process can be important and education.

What were you doing in the meantime at Sun?

07:40 I came out to Sun in 1996, but there was too much work to be done to start work on DTrace.
07:50 I’m a strong believer in debugging - which sounds like a stupid thing to say, like “I’m pro air”
08:00 Debugging is a great way of learning about a system and contributing to an engineering team that in a way that is not blocked on you.
08:20 I was debugging the system, just getting a visceral feel of what I wanted DTrace to do.
08:25 We started talking about DTrace as if it already existed, which was a very annoying thing to do.
08:30 Someone would have a problem, and we would say: “DTrace would solve this problem” - but it didn’t exist.

So it’s like marketing, before you have the push for it?

08:50 It shows that even when you’ve got a big and important idea, sometimes you want to take an indirect approach to get there.
08:55 You want to understand the problem space, and the five years between coming to Sun and starting to work on DTrace laid a lot of foundation.
09:05 By the time we went to go and work on DTrace, I was not naive, and I knew exactly what I was looking for, what I wanted, and what I knew what we could do to deliver some early results.
09:20 We started delivering results pretty quickly - we were able to do things that couldn’t be done previously, and it was fun to be able to turn on the first spotlight in the system.
09:30 We were able to discover all sorts of things that people didn’t know existed.

I still remember the first “A-ha” moment when I first saw DTrace.

09:45 It’s great to have been part of that, and it’s fun to have developed a technology that took so many hardened technologists by surprise.
09:55 I had one customer who said “If I had know this was possible, I would have demanded it a long time ago.”
10:05 I still seek to do that - be a part of teams that are willing to do things that people don’t think are possible.
10:15 Sometimes you fall short of the mark, sometimes you’re going to learn along the problem is a lot harder than you thought it was.
10:20 I still believe in those big bold swings, and when they connect they deliver such incredible value.

Speaking of your InfoQ talk - why should you rewrite an OS in Rust?

10:50 I have hit a hot button on that talk - that has been up for a couple of weeks, and 60k views - but they’ve left YouTube comments on.
11:05 You shouldn’t read the comments - even my 14 year old is saying it’s not a good idea.
11:15 That talk - for whatever reason - gets people riled up.
11:20 I’m not sure why - I tried to be somewhat circumspect about it.
11:25 I’m not saying we should throw out something that works and re-write it in Rust.
11:30 I do think that Rust is really interesting.
11:35 To me, it is the biggest development in the creation of system software in my career.
11:45 My career post-dates C/C++, so we have not had a big language break-through.
11:50 As a lot of systems people, we are cynical about what a programming language can deliver.
12:00 We have had to accept this trade-off, that when a programming language delivers more to you, it takes it from you at run-time.
12:10 For example, garbage collection - the programming language gives you the ability to use memory dynamically without thinking how that memory is restored to the system.
12:20 It’s then going to exact this horrible run-time cost in garbage collection.
12:25 It’s not to say that garbage collection aren’t really cleverly implemented, or that there aren’t great implementations out there.
12:35 The problems it presents is endemic, that you have this performance pathology that you can’t get away from.
12:40 Memory consumption will effectively become CPU utilisation.
12:45 Even in a garbage collected language, you can have memory leaks or resource leaks.
12:50 Those will manifest themselves in painful ugly deaths, where instead of dying cleanly with out-of-memory errors, the system will work harder and harder to try and find garbage that doesn’t exist.
13:05 You then have this subset of your service that has become this tarpit, and anything that touches it is going to see outlying latency and you get cascades.
13:20 I need to break up with garbage collected languages for higher level services, and they certainly aren’t appropriate for the operating system.
13:30 People have played around over the years about having the operating system kernel written in a garbage collected language, but that’s insane - that’s not going to work.
13:40 It was insane at Sun, back in the day, when it was Java-everything.
13:45 Sun was so a Java-centric that we ate in a cafe called “Java-Java”.
13:50 If you were not a Java person (which I obviously wasn’t) it was like “Can we eat in a neutral location?”
14:00 If we had had a cafe called “Solaris-Solaris” I don’t think the Java people would be into it.
14:10 I found myself at a bit of a crossroads; I’m sick of garbage collected languages, and I’m mindful of the fact that C is a pain in the arse for upstack software especially.
14:25 It is a pain in the arse to parse things in C, to deal with Strings - which isn’t to say it’s impossible.
14:30 In some cases, it’s not even difficult, but it often tedious.
14:40 The other problem is that C doesn’t compose very well.
14:45 The biggest problem is that you can’t just pull a library off the internet: everyone’s C is laden with so many idiosyncrasies about how that’s used and the contract about how memory is used.
15:00 If I call your function, do you allocate this structure for me and pass it back, under what conditions do I free it, do I pass it to another function to free it?
15:10 We have worked our way around this stuff with strong conventions and disciplines about the way we write libraries - we being my tribe of C programmers.
15:25 My tribe transcends operating system and company - I know that when I am in the code written by one of my tribe, because it’s clean C code: well commented, structures have prefixes, function names all have the same prefix, they all have a noun_verb kind of structure to them.
15:45 The reality is that that is not most C code - the tribe is small.
15:50 Even for the most basic things, I find that you have to write a bunch of stuff from scratch.

So is Rust attractive because of memory ownership?

16:05 Yes, the attraction of Rust is that the ownership model is actually novel.
16:10 It is ownership in the absence of garbage collection that allows you to address this composability problem.

So how does ownership work?

16:20 Ownership is statically tracking who owns the structure, such that the compiler can determine when a structure is no longer in use and can be freed.
16:35 You are getting all of the power of a garbage collected language, but there is no garbage collection because the system itself that it knows when things are no longer used.

Ownership is something you have to work with?

17:15 In order for Rust to do this really incredible thing for you, it’s going to constrain how things are used.
17:20 That’s the hang up that most people have, in that they hit those constraints and they don’t know why those constraints exist, only that the compiler is complaining.
17:35 From my perspective, when you are implementing in C, you can understand the assembly that the compiler is generating.
17:45 Similarly, when you are implementing in Rust, you can feel that underlying C that the compiler is trying to implement.
17:55 When the compiler is complaining because it is confused about who owns something, you understand why it’s complaining.
18:05 It can be frustrating - there have been times when you have had to jump through hoops to tell the compiler what’s happening.
18:15 The compiler has gotten a lot smarter recently, with non-lexical lifetimes.

The learning curve for getting into Rust can be quite steep?

18:45 There is a learning curve to it, certainly - I found that to be less acute than I expected.
18:50 It may be that I was expecting so acute and foreign I was prepared for it.
19:00 It is true that as Rust has got a slightly higher early cognitive load.
19:05 I found it to be not that bad.
19:15 I am not a Rust expert, and when you go and look at some of these really creative crates using procedural macros, there’s definitely rocket science there.
19:25 I don’t want to imply that you cannot get a quick mastery of all of Rust.
19:30 It’s different from other languages where you understand the entire language but find that sophisticated things you can do really nasty things with it, like the pre-processor in C.
19:45 The per-processor is absolutely essential in C.
19:50 If you were to take away the pre-processor, you would undermine the development of C.
19:55 C is not just C, it’s C and CPP, and CPP is historically an entirely different program.
20:05 You needed to understand that in order to effectively write C.
20:10 You had this long tail, and with Rust you are absorbing more of that up-front.
20:20 I found that I got fluid with Rust really quickly, and I found I could go back to Rust that I had written a while ago and still understand what is going on.
20:25 It feels like a low bar, but in this day and age, it is easy to have a lot of context, then write some code, then when you come back to it ask what you were thinking.

What’s the type safety story with Rust?

20:50 It’s very strongly typed; as a result, Rust gets very persnickety about things.
21:00 One of the things that I actually love its overflow safety.
21:05 It’s really easy in C to have integer overflow, which often leads to memory-safety issues.
21:15 They won’t be of the kind that you’re likely to see in the natural order of things, but of the type that’s likely to be exploited by an attacker.
21:20 A classic way to exploit a C program is to convince it to do its bounds check incorrectly, by overflowing the bounds check.
21:30 This allows dereferences to access memory arbitrarily - the attacker can then sculpt what is dereferenced.
21:35 Rust makes that damn near impossible.
21:40 It is very good at determining how you are using unsigned versus signed types, and preventing that overflow or sign extension.
21:50 I found after using years of C and JavaScript, JavaScript is type unsafe to put it lightly.
22:10 It is so easy to have a typo in JavaScript that you don’t find until run-time.
22:15 The popularity of TypeScript shows that people are ready for much more strongly typed JavaScript.
22:25 I found that it’s such a relief to be back to really strong typing.
22:30 I like that strong typing throws the cognitive load back on to the developer.
22:45 I am wary of making software development too easy, in that I think it is a mistake to allow developers to develop code that they do not understand.
22:55 We have enshrined the notion of developer velocity - a term that I hate, because it implies that a developer is like a projectile - it implies that it should be fast to develop software but which may cause nasty to debug problems in production.
23:10 When I look back on my career, I have spent more of my time debugging software than writing software.
23:15 I have spent more of my time debugging software written by others.
23:20 I don’t know if it’s karmic debt or surplus that I’m running, but I think it’s worth developers of the world slowing down a bit and actually have more of that cognitive load in development, so maybe the artefacts we get into production don’t need to be debugged as much.

But with the Netflix paved road, there’s a reduction of friction, which is a benefit?

24:20 There is, so the art is you want people to not have the job be reduced to tedium - you don’t want people solving the same problems over and over again.
24:35 You want composability, you want abstractions - what you don’t want is where you have removed so much friction you can develop code that is riddled with hidden problems.
25:00 It forces the developers to slow down by running lint (or JSLint) on their code.
25:05 That is not going to make anyone faster - but it’s going to result in better artefacts.
25:15 What Rust effectively does is it takes the checking and it falls out of the model that Rust has from a memory management and type safety perspective.
25:30 Once you’re up to speed, it’s actually possible to write it quickly; you develop a quick intuition as to where the borrow checker is going to complain at you.
25:40 Because you do have much better composability, you are able to get on that paved road very quickly.
25:50 Once you are up to speed on Rust, you are going to be able to develop quickly and correctly.
26:10 Once you are over the learning curve; there are some big bargains that Rust makes, that you have to understand.
26:15 You have to understand the ownership model in order to understand Rust.
26:20 To understand JavaScript, you have to understand closures.
26:25 I always thought that JavaScript should be a closure-centric approach.
26:40 With Rust, you absolutely have to understand the ownership model.
26:45 If you spend the time - Rust does need to be learned.
27:00 As developers, you haven’t had to do what you did when you first came to computer science and programming and sit down with a book and learn a new thing.
27:10 With Rust, that’s what you need to do.

What about concurrency?

27:20 I have honestly done very little with concurrent Rust - my Rust code has all been single threaded.
27:30 However, just from the way that the ownership model works lends itself very well to concurrency.
27:35 When I am writing single-threaded Rust code, it lights up multi-threaded parts of my brain in the way I think of the problem.
27:40 When you are borrowing memory, you can think of a function that you’re calling as another thread that you are handing control off to.
27:50 Even though it’s not, it mentally has that feel to it, so you can see how naturally that model lends itself very well when you want to have parallel execution.
28:05 The compiler knows that these two memory objects are not being used by these two threads in parallel, because I know this memory object is only being used in this scope over here.
28:15 That allows for pretty transparent parallel execution.
28:20 You can see how that could lead to a revolution in multi-threaded programming.

What does Rust’s unsafe mean?

28:40 Rust allows for unsafe operation - it does so very explicitly.
28:45 I have not found that it to be needed in pure Rust code, but there are times when you have Rust code that is handed a body of memory, it needs to be wrapped in an unsafe operation.
29:05 This is generally used by those who are developing those abstractions that are widely used.
29:15 To make those operations efficient, you do want them to do direct pointer manipulation very carefully and that is scoped in an unsafe block.
29:30 That allows you to get that performance trade-off - you can selectively turn off Rust’s borrow checker and ownership checker allows you to do things you wouldn’t otherwise be able to do.
29:45 You can do that in a limited scope, but give you a composable abstraction (like a vector) that you can use everywhere, and know that the unsafe bits have been properly audited.
30:00 People shouldn’t be using unsafe in their code in general, and if you are using unsafe routinely then it’s a problem.
30:05 It’s a problem in the way one is implementing Rust or that it may not be the right thing for that job.

What are the kind of places that Rust shouldn’t be used?

30:25 I do think that Rust struggles, because the ownership model is so central, with data objects that are multiply owned.
30:40 If you have an object which is pointed to by more than one thing, Rust will struggle because it doesn’t know who owns it.
30:50 A common example is a doubly-linked list, is actually two singly linked lists with multiple ownerships of each node.
31:05 When people feel the pain of Rust is when they try to implement that double-linked list, and Rust really doesn’t want you to do that.
31:15 You want to resist the temptation to do that in unsafe operations - you want to take a step back, think of the problem you are trying to solve, and take a Rust approach to it.
31:25 Not because Rust is being dogmatic about whether doubly-linked lists are useful or not (because they are).
31:35 More because Rust wants to be able to give you the power of a garbage collected language without garbage collection - but you have to avoid these kinds of data structures.
31:50 There are ways to do it in Rust, but they are very complicated.

When we are talking about an operating system, what do we mean?

32:20 First of all, you talk about user-level in general, we mean operating at the unprivileged level of the microprocessor.
32:30 The kernel is running at the privileged level of the microprocessor.
32:35 When you’re executing in the operating system kernel, you can do anything with the hardware - when you’re in user level, you can’t.
32:45 When you are executing in “ring 0” on virtual hardware, you are in a virtualized ring 0 that’s been created by the hypervisor underneath you that is using the microprocessor’s support.
33:15 There’s a couple of rings below the hypervisor; on x86 Intel chips there’s something called SMM - System Management Mode - which is a very scary mode that the microprocessor can go into for any time and any reason and do whatever it wants.
33:40 Any OS developer doesn’t like this at all, because the idea that the microprocessor can just disappear for a while and then come back to the OS kernel or hypervisor is disorienting to say the least.
33:50 In SMM, there is unseen software that is executing underneath the hypervisor.
34:05 Then there is something running underneath SMM, the Intel ME - Management Engine - which runs that is software yet deeper in the system.
34:15 There’s all these layers, each of which thinks they control the machine, and which are the OS? All are at some level.

Are we building Operating Systems as monoliths?

34:35 It’s not so much a monolith as a layer cake - each of these was invented for a potentially good reason, but what we’re left with is a layer cake that can operate at cross-purposes at itself.
34:55 With spectre and meltdown we have been able to see down the layers of the layer cake.
35:10 I don’t think it’s necessarily wrong although I do think we have a scary amount of software that runs underneath us.
35:25 Especially for that unseen software, it is important that we bring a discipline to it.
35:30 It’s scary to me when that software is written in assembly or C.
35:35 That software, not just in the OS, in the SMM or the IME or the BMC or other unseen parts of the machine running firmware, I want to see that being written in a type-safe language.
35:50 I think Rust is a great fit for firmware.

So you can write parts of the OS in C and replace it piecemeal with Rust?

36:10 Because Rust doesn’t have a runtime, the objects that you get from the Rust compiler are relatively simple.
36:20 They are traditional objects that are devoid of runtime, that can take the control of execution and be done.
36:30 As a result, you can integrate Rust into C and C into Rust.
36:35 By being able to be integrated into C it can be integrated into multiple environments.
36:40 It’s potentially a very good fit for a lot of things, in that you can take an iterative approach.
35:50 For instance, you can take a device driver written in Rust, and have a system at large written in C and take a similar approach when we moved to structured programming languages from assembly.
37:00 We slowly re-wrote parts of the system in C that were in assembly.
37:10 In every operating system, there are important chunks that are still written in assembly,
37:15 I think that every operating system will have important chunks in assembly, chunks in C and eventually chunks in Rust.
37:20 I don’t think we’ll see the whole operating system written in rust, but I do think we’ll see growing elements in Rust.

So are we going to see DTrace written in Rust?

37:30 I thought about it - you could do the in-kernel components in Rust.
37:35 I think you would end up with so much unsafe operation that it would undermine the value.
37:40 One thing you could definitely do it is the libdtrace user component that consumes the kernel output.
37:50 The dtrace utility could also be written in Rust.
37:55 I think that some of the earliest places to integrate Rust in the operating system are in the OS utilities and libraries.
38:05 I view the system libraries as part of the operating system; it’s not just the kernel that’s the operating system to me.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.