Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations The Talk You've Been Await-ing for

The Talk You've Been Await-ing for



Steve Klabnik goes over the deep details of how async/await works in Rust, covering concepts like coroutines, generators, stack-less vs stack-ful, "pinning", and more.


Steve Klabnik is on the core team of Rust, leads the documentation team, and is an author of "The Rust Programming Language." He is a frequent speaker at conferences and is a prolific open source contributor, previously working on projects such as Ruby and Ruby on Rails.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Klabnik: I'm Steve [Klabnik], I'm here to give you this talk with this terrible or awesome pun depending on your opinion, "The Talk You've Been Await-ing For." I work at CloudFlare on some stuff totally unrelated to all these things, but before that, I worked at Mozilla on the Rust programming language and I'm still working on the Rust language in a hobby capacity these days. I've been on the core team since 2014 and I wrote the book, "The Rust Programming Language," if you're trying to learn Rust. This is a talk that I've been really super psyched to give for a very long time and it's about how async/await works in Rust at a very deep technical level.

This is a screenshot from this blogpost. This actually happened last Thursday. We released async/await in Rust, and this is a culmination of four years of developments, maybe three or four years in total, and it was the work of a lot of people, all of whom were not me. One of the functions I had to play in the Rust world is amateur historian. I like to collect what we've been doing and then present it to people. I'll make it super clear that I didn't do any of the work to design this feature. I'm just here to explain it to you and I have some citations. This talk specifically draws on some really great blogposts written by some of the people who did do all of that work and so I have some links at the end and you can also go read a bunch of that stuff.

Previously at QCon, I gave a talk called "Rust’s Journey to async/await." That talk was given about a year ago and it was a very high level overview of why Rust cared about async/await and an overview of different concurrency methods. If you want a high level version of the talk, you can watch that previous one. I'm really excited for this, now that we've actually shipped the feature to talk about the low level details. We're going to talk about how asynchronous functions desugar into more complicated Futures and there is a lot of code in this talk and a lot of type signatures.

How many of you have written Rust before or would think that you're reasonably competent at Rust? For those of you who are not, hopefully you can follow along, I am sorry if it's a little tough. I'm going to try to take my time and go through things, but there's a lot of stuff going on here, and I really want to get deep into the nitty gritty because I think that's really the most interesting part.

I picked this image because it's a meme but also, I think it really fits with the theme of this track because Rust's implementation of async/await is in many ways, actually novel. It's not novel in the sense that we invented brand new things, but we assembled a bunch of existing concepts together in a way that had not really previously been done, and that's part of why it took us so long, but that's also why I think it's really interesting and exciting. I think we have the most performance possible implementation of async/await that could ever exist. We'll see if someone proves me wrong someday, but when you look at overheads and stuff, it's very low, and so we're going to get into how all that works. It involves a lot of moving parts.

Fundamentally, we're going to talk about three things, and then depending on if I have time, we'll cover one last little bonus round. I usually give talks that are not very code heavy and so I know how long those are, but this is a very code-heavy talk so I wanted to put a little bit of extra stuff in there, so maybe we'll get to async functions and traits. The three core important things to understand when you want to get into async/await in Rust at a low level is, first of all, the concept of async/await and also Futures, which is the main feature that's backing it. Then secondarily, this thing called Generators, which is the underlying implementation of async/await. Then finally, "Tasks, Executors, and Reactors, oh my!" which is how the Futures actually execute, and there's some different parts that are broken up into that. Those three things that first are the most important thing to understand and the last one is an interesting extension to the language that I personally find really interesting so it's a little bit more about the future. Let's get into it.

async/await and Futures

First of all, we're going to talk about async/await and Futures. First of all, while I've mentioned this talk is going to have a lot of code and a lot of complicated type signatures, I wanted to show you how you as a user would write code with async/await in Rust. Here's an example taken from Tokio's docs. Tokio is a package that implements a bunch of the stuff that we're about to talk about so that you don't have to do it yourself. This is a very simple program. You can see on line six we have a TcpListener where we listen to a TCP connection on a given local host, port 8080. Then we loop over accepting connections over that listener. Then finally, we do some processing, which I actually eliminated.

You can see that we got an async function main up here and we got a couple lines that are marked .await. The question mark is for error handling purposes in Rust. If you're not familiar with that, that basically says, if there's an error, then return the error, otherwise, return the value. This is the way that it actually feels when you'd be writing normal code with this stuff. You have async functions, you call .await sometimes in the body and you just go on with your day. The main reason people really wanted this is because the ergonomic benefits are pretty massive. This is much simpler than what it desugars to which is what we're about to get into.

Finally, I also wanted to mention that we have this concept called an async block. You don't actually need to have a full function for async/await. You can also do async and then a curly brace. In this case, the move is there for reasons I don't want to get into at the moment, but you can also take portions of your code and wrap them up in async/await as well. It doesn't actually have to be at the function boundary. That's useful because, in this example, there's this API called Tokio spawn, and that basically spins up a new task. We're going to talk a lot about tasks in the future, but the idea is that it spins up a new unit of work and so rather than make you write a brand new function every time, passing an async block to it makes sense. This is what writing code would feel and the way that most people would interact with async/await.

If you want to start understanding it at a more surface in a deeper level, the very first thing that you need to understand is that async/await is simpler syntax for writing this thing called a Future in Rust. There's actually technically an asterisk there. It's a little more complicated than that but that's the high level way to think about it. We're going to get into that asterisk in just a minute but ultimately, that's really the way that you want to think about it. A Future is this concept that represents some value that will exist sometime in the future. This is the core idea of asynchronicity, but this reifies it into an actual thing, and so instead of needing to have a callback, which may be executed sometime in the future, the idea is you have these objects, and Rust doesn't really have objects in the "Ugh" sense, but a bit of data that you can basically ask, "Does that thing exist yet?" It'll either tell you, "It doesn't exist yet. Check back in with me later," or it'll say, "Yes, it exists. Here's the value." That's this core idea and so that's the basic principle. That lets us represent asynchronicity in a way that works really well with everything else.

The first thing we're going to do is build a Future. Specifically, we have this example called a timer Future. What's going to happen is, this is a Future that goes to sleep for a certain amount of time and then after that amount of time happens, it will wake up and say, "Here is this particular value." This is a great "hello world" example of building a Future by hand, and while async/await builds Futures for you, at the lowest level, people who write libraries will implement Futures by hand. I'm going to show you how to do that so you can see what the guts look because we're going to talk a lot more about the guts.

The implementation strategy we're taking for this timer Future is very simple. You could write a much more efficient one than this, and that's also true of all of the code examples that I'm going to show in this talk. I'm trying to show examples that are easier to understand, not necessarily the most efficient. The folks who write libraries that do this for production grade things will be writing significantly more complicated code but will have more performance. In this case, we're going to put a mutex around a Boolean, and that Boolean is going to leave between our future and also a new thread that our future spins up and that thread is then going to sleep for however long we decide we want to wait. Then when that thread wakes back up, it will set the Boolean to true and tell the Future, "We're good to go," and calls to the poll method of Future, which we're going to talk about in one second, we'll check that Boolean to see if we've waked up yet or not.

This means you have a thread per timer, super inefficient mutex around a Boolean, also super inefficient, probably we would want to use an atomic integer instead. I'm trying to keep this very simple, like I said; a real implementation would integrate with your operating system to set up a timer and do all that stuff.

I've mentioned a bunch about Future so far. A future is a trait in Rust. That means that it's an interface that you can build different things. The interface looks like this, there's two main components. First of all, is the trait Future itself. It has an associated type called output and that's the value that you will get from the Future if it ever exists. Then there's this function called Poll that takes two arguments, itself and a context - we're going to ignore those completely for now. It returns this enum called Poll with the output type and the Poll enum is defined as either ready or pending. The idea is that we call poll and then we get back either a thing that says, "Yes, you're ready, here's the value," or you get back a thing that says, "we're not ready yet."

You may be wondering about error handling. Previous iterations of this feature had error handling built in but it's actually easier to compose this with standard Rust error handling idioms. What happens is, if you have a Future that can fail, then the output type will be what Rust calls a result, which indicates success or failure. You compose the result type with the Future's return type and that makes things a lot cleaner and interoperable and you don't need to represent different concepts of fallible versus infallible Futures and all that stuff. We explored with those designs over the last couple of years and this is basically the cleanest and most orthogonal way to do it. Let's actually build a Future.

We're going to build this timer Future. We create a structure to actually represent the data that we need to hold and inside it, we have this thing called shared state, which has an atomic reference counted thing wrapped around a mutex, wrapped around this thing called shared state. Shared state is going to have two bits of data. First of all, that Boolean that I talked about, which is going to represent whether or not we've finished sleeping, and then finally, this thing called a waker. We're going to get more into wakers in a second, but we need to keep track of both of these bits of data and this design and we're going to wrap them up in this Arc mutex, again, for simplicity sake. This is actually super inefficient.

Then we're actually implementing a Future for this timer Future. The output of our timer Future is not actually anything. All we're doing is just waking up. You would usually chain this Future with something else. You'd say, "Ok, when this is done, then do something else." We're not really producing a value out of Future. There on line two where it says the type output is the empty tuple, we're not actually producing anything.

Finally, line three is that type signature we just saw with poll. This is going to be implementation, what happens when the runtime calls our timer Future and actually wants something to execute. Fundamentally, what happens here is we lock the mutex and it is stored in our shared state so we can actually look at its contents. Then we check to see if it's completed, if that Boolean is set to true, then we return that we're ready with that empty tuple. If we're not ready yet, what we do is we stash a copy of the waker into that shared state. This comes from that context variable up top, this is specifically the end of line nine. We're taking a copy of the waker and we're storing in our Future and we'll get into the specifics about that in a second. You'll see how that stuff interacts. Then we return, "We're not actually ready yet." This is a very simple implementation; check the bool, if it's good, then we return. If it's not, then we say, "Hold on, we need to wait yet."

Then we actually create our thread that we spin up in a new method. We say we want to produce a new timer Future. We implement a method on it that takes a certain duration so we can parameterize how much time we want to wait, then this is going to end up. First of all, constructing that shared state, as you can see on lines three through six, we set our Boolean to false and we put a None into where the waker is because we don't have one yet because the future hasn't started actually executing. Then on line 9 through line 18, is where we're spinning up our thread. We're producing a new thread with thread spawn, we pass a closure to it with what we want to execute in the body of the thread, and on line 10 there, you can see that the thread sleeps for however long that duration is that we passed in. Then after it's done, it will take the mutex out on line 11 and unlock the mutex, set that completed Boolean to true, and then lines 15 through 17 are the interesting part with this waker. We take that waker that we had stashed in our Future and we call this Wake function on it.

This is an interesting part. I have a diagram later for how all these bits interact, but I decided to wait until after we talked about the Future. Basically, there's this concept called the waker and your Future before it returns that it's ready, it needs to call this Wake function to let the executor know that it's happening. We haven't gotten there yet, I promise we will. For now, just know that Futures need to interact with this API and they have to call Wake to awaken themselves. Then finally, we return the Future that we have. That's it.

There's a lot of moving parts going on here but fundamentally, we have our thread, we have our structure, the thread sleeps, and that's it. You don't need to write stuff this in everyday Rust because other people have libraries that have done it for you. This is an academic exercise, but I wanted to show you the bits and pieces of how these move together and we're going to build some stuff to actually execute this Future in the future, later in this talk, and so I wanted to show you the very basic implementation of this thing.

Four Rules

Next up briefly, I wanted to take a small detour and give you four rules as a user when you're using async/await to get an intuition for how you would actually use it yourself before we dig into the guts of the next big concept here. If you write an async function foo, let's say it takes a string as an argument and returns a 32-bit integer, then that's syntax sugar. What it's sugar for is a regular function that still takes that string as an argument but returns some implementation of a Future that returns that 32-bit value as its output. You can think of async function returns some anonymous Future struct. Why would you not just write that impl Future yourself? There's a lot of reasons that we're going to get into in a second, but the point is that mentally, this is how you think about async/await as a user as an async function returns a Future, but I don't have to worry about that. What that means is if you have some sort of Future that you get from a library or from another async function and you want to get that value out of it, you call await on that Future. This can work with any Future from any source, it doesn't really matter, and that's how you poll the value out is by calling await.

You could only call await inside of an async function or an async block because it's building up this chain of Futures, which is the next subject we're about to talk about. You can't just await on a random value anywhere. You have to do it inside one of these function bodies, and that's why it's called async/await - because you can't really use await without also using async.

The last interesting one before we get into some more of the details of how this system works, this is the biggest thing that trips people up that are coming from a language Node.js with their version of promises in async/await. That is Futures don't execute until you pass it to something that's conveniently called an Executor. In languages like Node, there is a runtime and that exists in all programs and there's a thing called the event loop, which we're going to talk about in a minute, and it executes whatever promises that you build with async/await automatically. As soon as you create a promise, it will get put on the micro task queue, and then it will end up being implemented.

Rust works a little bit differently - when you call this async function and you get a Future back, Futures don't start executing until you explicitly tell them to. This is a slight change in mental model, but it leads to some pretty extreme efficiency gains that are not really possible otherwise. We're going to get into what those are, but before I do that, we said that you can't execute a Future unless you pass through an Executor, but when I showed you the code earlier with Tokio, there was no Executor and passing a Future to it. How does that work?

There is this magic little attribute that happens called Tokio main that lets you write async function main, and basically, that's a little macro that takes your main function, grabs the Future out of it, and submits it to an Executor and then calls await. This means you don't actually have to do it yourself and you can think of this as more of a Node.js style of writing things. We've built this as convenience functions in libraries because we don't have a language built-in event loop. You want to be able to use libraries that have implemented these in different ways because some people are writing web applications and they don't care about dynamically allocating memory that much and they certainly care about network performance. Other people use async/await on embedded devices and they really care about not allocating memory dynamically and doing those things. They may not want to be able to have an unbounded queue of possible tasks because that means more dynamic memory allocation, and so there's a lot of options in the space that Rust is in. We can't build these things in the language and that means you end up using these library features to make it more ergonomic. We'll actually show submitting a Future to an Executor manually once we actually write an Executor in a bit.

There's one last part of this implication that trips folks up that are coming from Node or just JavaScript in the browser even, and that is this example. Here, this is an actual timer function that someone wrote. It's called Delay instead of the one that we implemented. I wrote this function called i_sleep that basically sleeps for five seconds. If you write this code where on line six, we call i_sleep once, and then on line seven, we call i_sleep again, we get two variables so we have two Futures, and we wait on them on lines 9 and 10. If you're used to coming from JavaScript, you might think that this function would take roughly five seconds to execute, because as soon as you would call i_sleep, it would start processing those two Futures and go around in the background, but because everything doesn't happen until you submit stuff to an Executor and await actually makes things happen sequentially, this will actually wait for the full first Future to happen, and so that'll take five seconds, and then it'll wait for the full second Future to happen, and that'll take five seconds.

This is a thing that definitely trips people up sometimes when they're getting started. If you want to emulate the Node behavior, there's a function called Join that will take two Futures and wait them in parallel. If you want parallelism, you need to expressly put it in in the Rust model. Probably the biggest shift coming from something like JavaScript to something like Rust is that you don't get this automatic execution. It's extra unfortunate because Rust has this reputation of if it compiles, it works, and so a lot of people come in from JavaScript, will write their first program with Futures and it compiles and they run it, and then it doesn't do anything, and they're, "I thought if it compiled, it actually works." It turns out that if you don't submit the Future, Rust will be, "Sure that computation you're describing, it is typed perfectly. Everything is set up, you're good to go," and then does not actually execute it, so we just return and don't do anything, and that can be a little surprising.

Those two things in combination are important to remember but the reason that this happens is what I'm about to get into now, which is that because Rust is really interested in type system stuff and efficiency, it means that we do a lot of static analysis of how you're using async and await to build up the maximally efficient thing that you possibly could execute.

The issue with the model that JavaScript uses, which is great for it as a language, is that in order for the Future to start executing as soon as you call it, we would need to actually allocate that as a task and then start executing it in the background while we create new values. Rust wants to take your entire computation the whole way through, everything you've chained together, and then compile it down into something super mega ultra-efficient, and you couldn't do that if your computation started running before you've even pieced the whole thing together. We're going to talk about how that implementation actually works now.

Generators aka Stackless Coroutines

Part two, Generators, also called stackless coroutines. There are a million different names for the features in this area and no one has consistent terminology whatsoever, so I apologize. Names are hard. We call them Generators, other people call them stackless coroutines. Some people are, "Coroutines don't return a value and Generators return a value so a Generator is a coroutine." Whatever, I don't actually care about arguing about names. The point is, we call it our Future Generators and it's fine.

Generators are not a stable feature of the Rust language yet. They're in Nightly and you can try playing around with them and so what I'm going to show you is the Nightly only syntax, but I'll make it clear that you can't actually implement Generators by hand on stable Rust yet. If you are watching this talk a year from now, maybe this all has changed. I just want to flag that as a thing, but it's important because Generators are how async/await is implemented under the hood. That's why we put them in the language - because the compiler uses this feature to do that efficient computation based stuff I was talking about. In order to truly understand the details, you have to know a little bit about this experimental feature.

At its core, Generators are sometimes called resumable functions. I never really thought that made sense to me. If that makes sense to you, awesome. Basically, what happens is, in Rust's case, we make this Generator and it looks a closure. You have the two pipes, this Generator takes no arguments and, in this case, we're taking a vector of numbers one, two, and three, and we're summing them up, but we're returning the intermediate results each time. You can see there's this for loop where we add the current value onto the sum and then we yield sum.

What happens is, when you invoke gen for the first time, it will basically run up to the yield and then return that first value and you get the value back, and then you can invoke the Generator again and it will continue processing execution and then run through the second iteration and then return the second value, and then again and again. That's why it's called a Generator - because you're generating these sequences of values. Fundamentally how this works is, it looks a closure, but you put yield in there. This is a feature that's in a lot of languages with a lot of names. I have used Ruby, so this made intuitive sense to me even though we never called them generators.

Here's a slightly more complicated Generator that basically does the exact same thing. We're going to be first adding up all the numbers and then secondly, subtracting all the numbers by going back down. The reason that I wanted to use this slightly more interesting Generator is I want to show you how Rust actually compiles this generator into efficient code behind the scenes and we needed something a little more complicated than just one yield.

If you would think about how you would try to implement a feature Generators, you need to save the intermediate state that happens between producing each value. What happens here is, on line four, we have an iterator. Iterators have internal state. You mark this with this comment, iter0. Up till the first yield, we're going to need to save the value of this particular iterator. Then on line six where we yield that sum, we also are going to be giving this value back and that's going to be the starting value for the next iteration of the computation. We're going to need to save both the state of the iterator and the value that we yielded in the first step through the Generator.

Then, likewise, lines 8 and 10, when we reverse through the iterator, we're going to need to save data at every single step and we're going to need to save the sum the second time. We have these two big sections of yield with two little bits of state in between them. You can actually think about it in Rust code. There are these four stages as we pass through the state. First of all, there's the Generator that we've never called, we've never actually asked to execute, and that holds this state of a list of 32-bit integers.

Then after we call it for the first time, we move from this unresumed state into this suspend state. Now, we still have our vector of integers, but we also have the iterator and we have the intermediate sum. After we totally exhaust that, we move into the suspend1 state and that means we have it similarly. We need the list that exists and then innovator that's going backwards now and the current sum state. Finally once this computation is done, there's this return state here.

There could be a totally different talk on this computer sciencey track about how coroutines and Generators. If you squint to this, it looks a state machine. We're progressing through these states and there's a really beautiful underlying unified theory of computation that puts all these things together in a wonderful way. I don't have time to explain. Basically, Generators are this really convenient way to write a state machine and this is what the state machine would look like desugared.

What this means is that because there's an enum of structs, the Generator has a maximum size and memory of the biggest chunk of state that you need to save at any given point of the computation, but we can reuse that space for the smaller ones, and so you get this really compact, nice, and tiny representation of the value, which is really efficient and that will become important later. Anyway, that's a quick whirlwind summary of Generators and what the code they generate looks like. The way this interacts with Futures in async/await, Futures need to have a poll method that's called over and over again until some value is produced.

Generators let you call, invoke them and then they yield a value over and over to get the sequence of values. There's this interesting symmetry here and that's why async/await is implemented internally as a Generator that implements the Future trait. Basically, async/await is a simpler syntax for not just Futures but more specifically, a Generator that implements Future. You would think these state transitions come from the values that are returned by the calls to poll and everything fits all nice together, and that means that we can take your entire chain of async/await computation, compile it down into one single state machine that represents that whole entire chain of computation, put in a very compact representation, and know at compile time exactly the size that that computation will take.

We call these stackless coroutines because they don't allocate their own stack, but you can almost think about stackless coroutines as perfectly sized stack coroutines. We don't need to allocate every single iteration through, like what happens with JavaScript. We can instead get the exact size that's needed for the whole computation all at once at compile time and be perfect. This is one of the reasons why Rust implementation is a little novel and also why it's super highly efficient - because you can't actually hand-code it better yourself.

There are definitely some optimizations that we're missing when it comes to compacting the Generator state, so there still is better work to be done. I also want to set some expectations there, but in a theoretical sense, we're doing as much processing at compile time as you possibly can to make this as zero overhead as possible, and that's really cool.

Tasks, Executors, & Reactors

We know how to use Futures or how to implement Futures ourselves and we know how we can use async/wait to work with futures, but as I said, in order for a Future to execute, you have to put in on an Executor. Now, we're going to talk about the last remaining part of the system and also another part that's interesting due to some other splits, and that's the Tasks, Executors and Reactors chunk of the story.

Generators and Futures and async/await are all in the Rust language, but as I mentioned before, Rust doesn't have a built-in Executor and that's because Rust is used in many different contexts and we need to give people choice, and there's some people that don't even need asynchronous computation and we don't want to have them have to pay for this stuff if they don't need it. The way this works is that the Future trait and Generators are in the standard library, async/await is in the language, but all the ways that you actually execute that is left to the library ecosystem.

I showed you Tokio before. There's a number of other projects that are providing different Executors and Reactors that do different things for different areas and some of them are competing to be the best one for web apps and some of them are doing embedded, and all these other things. What I'm about to show you in this section is an implementation of a simple executor but again, the folks who are building the real libraries that use in production do a lot of tricks to make this much faster. I'm trying to emphasize for learnability, not for ultimate speed. The nice thing is that you can build one of these things yourself and use it in your toy program and then rip it out and replace it with a production ready one and now all your stuff just goes faster. Interfaces are awesome.

Node has this thing called the event loop. I bring up Node in this talk a number of times because Node was probably the primary inspiration for how we did all this stuff. A lot of people in the Rust world were JavaScript programmers before and Node made this style of programming cool even if it didn't pioneer it ultimately and so I tend to compare against Node because that tends to be what people know in the space. Node programmers will talk about the event loop and that's this thing that makes your asynchronous computation happen. Whether or not you're just a regular person writing Node code, you don't want to think about how the event loop works, and all you know is don't block the event loop is like a meme and you try not to do that. Or, if you're someone that's on the Node TSC and you're a contributor to libuv and you know about all the guts, they talk about it in one unified way.

In Rust, because we don't have one guaranteed thing that went with the language, we actually broke it into three different concepts, and we're going to use a fourth minor one later, but when we talk about asynchronous work, there's this idea of a Task. In the case of Rust and async/await, a Task specifically represents some chain of Futures that you've made into a computation and you're, "I want to actually execute this thing now." You hand a task off to an Executor to be scheduled and run. The Executor is just a scheduling algorithm and it takes a list of Futures that it knows needs to run and it figures out how to run them. Because there's all sorts of different strategies you can use, we're going to use a very simple one that's based on a first in first out queue. There are other people that do multi-threaded work stealing Executors, and all these other kinds of things, so there's a lot of space to explore with how to schedule work that are good for different workloads.

Then finally, there's this concept called a Reactor. This would be what libuv is in Node world. It's the thing that notifies the Executor which tasks are actually ready to continue executing. This often is tied with a specific operating system UI and we split these things apart conceptually even though a lot of projects provide both. For example, Tokio gives you both an Executor and a Reactor but in theory, you could actually pair Tokio's Reactor with a totally different Executor if you didn't their executive decisions, but they're separate things whereas all this is bundled up into one thing in many other systems. That gives us flexibility, but that also means intense type system stuff.

We ignored some of the details about the call a poll earlier, but if you read the type signature, you can see how this works with the Executor and Reactor. If we take a Future and tasks implement Future, which is when you pass the task to an Executor, the Executor's job is to do that repeated call of poll. It's going to be calling poll to make sure that your Futures are doing the work they're supposed to do, and it gives this sort of context argument to Futures. What this context argument actually is was the source of a lot of the situation of developing the Future trait, but what it currently contains at the moment is that interface into the Reactor. That waker concept I referenced before - the Executor is, "Ok, you're using this particular Reactor. I bundle up in this context and I pass it along to the Futures." That's how the future is able to interact with the Reactor is through that waker interface and that's part of this context chunk of stuff.

There's four levels of indirection here, so it takes a little while to get synced into this. Honestly, one of the reasons I wanted to do this talk was to force myself to walk through all these details and make sure that I got it, because I knew it at a high level before but I wanted to do a bunch of the research. An Executor calls poll, it gives you a context, and a context contains a waker, and that waker is how the Reactor talks to a Future.

Let’s Build an Executor

Let's actually build an Executor together. I'm going to show you a very simple Executor. Again, it's super inefficient but actually works. Before we get into the code though, we're going to do a good old fashioned flowchart of all these bits and pieces because, like I said, there's a lot of moving bits around here.

If we have an async function foo that we want to turn into a task somehow, we are going to call this interface called Spawner and spawn foo. If you noticed, there was that API in Tokio and the first thing that had Tokio spawn that let you spin up a new task, so tasks can also produce new tasks and sometimes, those are nicknamed Spawners. You don't have to make them a whole separate thing but we're about to do that in our particular implementation. We hand foo off to this interface that says, "Please make a task out of the results of foo." From there, that is going to send it to the Executor, and in our case, we're going to have a queue of tasks. We're going to put things into this queue, the Executor is going to go through the queue and execute each task in turn. It's going to hold on to that state.

When the Executor calls poll on a Future that happens to be in its queue, that's what it's going to do to execute the Future, call poll and see what happens. That Future will return either ready and then it will be done and the Executor will remove it from the queue or it'll say that, "I'm not ready yet," and the Executor will say, "Ok," and it won't keep that in the list of things that needs to poll, and it's basically going to say, "I'm not going to call poll again until I'm told that you're ready to continue executing." This is where the waker bit comes in.

At some point, whenever the Future is ready, it's going to call Wake in the background. If you think back to our timer Future, this happened in that background thread so that's why this keeps happening. You may wonder, "If the Executor never calls poll again, how am I supposed to even run the code for the waker?" It depends on what you're doing. In our timer example, it was that background thread. If you were building a full blown Reactor, it would maybe be, "You've handed off a call back to epoll that's going to invoke the waker when that happens," but in some way, the Future is going to be told, "I'm ready to proceed," and it's going to wake up, and that's going to cause the future to be re-put onto the ready queue of the Executor to be executed again. That's the basic control flow of how the system works.

Let's actually put one of these things together. As I mentioned, we have our Executor and it has a ready queue and there's going to be a channel. That's what the receiver part is, we're going to be sending things to the Executor over this channel and we're going to treat our channel a queue because you can do that. When people put stuff on the channel, a channel maintains an internal queue and we pop things off so we're just going to keep listening to a channel to implement this particular queue, and it's going to get a reference counted task.

A task contains the Future, and this has a frankly bonkers type signature, a mutex option box Future of static and then no output. Box Future is itself a type alias. This could actually go two or three more nested types. Types are happening at compile time, not runtimes. The more types you put together, the more compile time stuff happens and the faster everything is. Isn't that totally true? The point is, is that we needed this to be thread safe and actually, we don't even need the mutex here because we're not building a multi-threaded Executor, but Rust forces you to make your code thread-safe to begin with. If we were making a single-threaded Executor, we could eliminate the mutex with some unsafe code, but I'm trying to make this simple, not complicated so it's some overhead that we wouldn't actually need in production. Don't worry, your Futures are not all wrapped in mutexes and boxes and stuff.

Then finally, a task is also going to have the other end of that channel. When a task needs to reschedule itself, it's going to send itself down this channel, so it needs a reference to it, so it knows where to send it. That's the other component of our task and Executor. Then to make it a little easier to spawn tasks on the Executor, we're going to split out the Spawner type. You don't have to do it this way, but it can be easier to send the Spawner around rather than moving the Executor everywhere, and it holds the other end of this channel that is in the Executor. The sender sends stuff down the channel, the Executor pulls stuff off the channel, and then actually runs the Futures.

We made this little helper function called new_executor_and_spawner that spins up an Executor and Spawner that know how to talk to each other. In this case, because we're using this particular channel, we're going to give it 10,000 tasks maximum because this is a toy. Most real ones will give you a queue that is unbounded if you want that behavior and all this different stuff. This is how we'll get our Executor and Spawner.

Spawner is relatively straightforward. What we need to do is we need to take our Future, which is the task we're to create, we box it up, and then we put it inside that arc, and we give it that reference to the particular channel that we're sending stuff down, and then we basically send it down the channel as, "This is ready to execute." Pretty relatively straightforward.

Then we implement this, the wake method on our task. In this case, this would be where you would put more complicated logic if you were doing something interactive with the operating system. Here, what happens is, whenever a Future calls wake, what we're going to do is send it back down to that channel to be executed immediately. We create a very simple system by which anytime you call wake, we're not going to sleep anymore; we're going to immediately re-queue you to be executed.

Finally, here's the guts of the Executor itself. This is the last big chunk of code slide. An Executor is a while loop. We have a while loop and we receive stuff off of our queue there on line three, and when we get something off of the end of the queue, we have that task, and then we use the mutex to grab access to it and we take the Future out of this particular slot in the task because we want to see if this Future still needs to be executed or not. When the Future is done, we'll no longer have the Future anymore, so we have to double check if it's finished or not. That's what line eight does - it checks to make sure, "Does the Future still exist?" If it does, we want to rip it out of the task. Lines 10 and 11 set up that waker. We're taking the particular waker that we implemented and we're turning it into a context because that's what needs to be passed to a particular Future. Then finally, line 13 is where we call poll, we pass in that context, and we see what happens. If it's a pending, then we're still processing the Future, so we need to put that Future back in the task since we ripped it out before. We re-put that back in and we're good to go. If the Future is done executing, we don't need to put anything back in and we're just finished. That's really it. This is just ultimately the guts, as I said, of the Executor. You loop over the queue, you run the Futures, you check what they're doing, and that's it.

This is a relatively small number of lines of code, but it would let us actually implement our own async/await stuff. Here's an example of using this Executor we just defined. We call a new Executor and Spawner to get a new Executor and a new Spawner. We call spawner.spawn and in this case, we're going to use that async block to just print a message, sleep for two seconds, and then print another message. Then finally, on line 15, we're going to drop the Spawner, and the reason why is to tell the Executor we're done. Our Executor implementation will wait until the channel closes so it will always sit there. If we don't actually tell the Spawner we're done spawning tasks, it would just sit there waiting for tests forever and our program would hang.

Then we call run on the Executor and it will take that task that we added and do its magic and run it. This is the minimal code example of how to put all these bits together and it shows you how the control flow works through the system and the way that all these different parts work together.

A Quick Aside About Pin

We didn't talk about Pin. Pin was part of the function signature of the Future. If you noticed, I said Pin self, and we didn't talk about it. Pin deserves its own whole talk, it's really interesting and complicated. The idea is before a Future starts executing, we need to move it around in memory. For example, to create a task out of a Future, we often put it on the heap, so we need to move it from the stack to the heap, which involves moving it in memory. The problem is, Rust's systems language doesn't have a garbage collector, so we have pointers. If you move something where there's pointers that exist to the thing, you're going to invalidate those pointers and everything is going to blow up, so that's not great. The problem is, once a Future starts actually executing, it can't move in memory for similar reasons. Before it executes, we need to move it, but after it starts moving, it can't move at all because if it moved, it would cause all these references to be invalidated. The rest borrow checker really doesn't like, sometimes you can move stuff and sometimes you can't.

Somebody had, without boats, specifically came up with this idea of the Pin type. What Pin basically says is, "Once you construct a Pin out of a pointer, it will no longer move forever." This lets us have stuff that's not pinned at first and then part of the job of creating a task is to actually pin that Future into memory to say, "This is going to live here forever so you're free to start executing yourself." There's also a type called unpin that says, "I am not a pointer. I don't care about being copied around." This is very similar to how copy says that they don't care about move semantics and the rest of Rust. That's really all I have to say about Pin today. Pin is something you'll virtually never use unless you're building a specific low level Future by hand and so it's complicated. You just don't have to think about it basically ever as a user, but the people who write your libraries do so I didn't want to leave that out.

Let’s Build a Reactor

Let's build a Reactor. Just kidding, we're not actually going to build a Reactor. Just kidding, we technically did build a reactor. Our Reactor was very dumb and all it did was immediately be, "Yes, you're going to start executing again," but this part of implementing the wake trait is where a Reactor would fit into this hierarchy of stuff. If you didn't want to say sleep, you wanted to wait on Epoll or do whatever, that's where this wake machinery comes in and that's the part of building a Reactor. We did technically write a one-line no-op reactor but I'm not going to be able to get into a complicated one because frankly, I've shown you far too much giant code blocks in this presentation.

There's no other really way to do it without getting into details, but this could be its own whole thing. This is how to implement libuv talk. We actually don't use libuv ourselves, Tokio uses a library called mio, which stands for Metal IO, which is a libuv competitor. The point is, there's a lot of rich, interesting problems here and I will never do it justice so we're just going to leave it at that for this talk.

Bonus Round: async fn In Traits

I have five more minutes, so I'm going to go over the bonus round because I do have time, and because I think it's really interesting. I talk about some of the future extensions to async/await and how you may hear people talk about type theory questions and you may wonder how it's relevant to you as a programmer, and this is the easiest way that I can explain it. Async functions and traits. Right now, as of last week, you can write async functions in Rust but if you want to write a trait with an async function, you can't. That feels really weird and awkward at first because you're, "What's the big difference between a function in an interface as opposed to a function outside of interface?" The core of it comes down to - a function is just one single function, it's not a big deal, but a trait is implemented on many types and so ends up being many functions and as we know, when you go from one of a thing to as many things as you'd like, that's where complexity always creeps in and so this is where the type system complexity ends up being a thing. We haven't built the type system machinery to handle this system situation yet.

You may ask, "Why does that matter?" Here's a trait called foo that has an async function called foo that returns a 32-bit value, if you wanted to think about how this would desugar in the same way that we talked about how async functions desugar into something that returns an implementation of a Future, it would look this. We have an associated type that would be a Future of some kind and it would return that associated type. This would be syntax sugar for that. This is fine, you can actually do this today but the problem is, is that this only would work for extremely simple things and it gets way more complicated and the reason is, lifetimes. Always in Rust, lifetimes end up complicating everything, which is why the language exists but also it's "Lifetimes, yes!"

If we think about a more interesting trait, let’s say we have a database trait that will fetch users, we're building an ORM and want to get a user out, you'll notice that we're borrowing self here. That borrow means there's a lifetime and so that means that un our very simple example where an async function desugars into just return impl Future, we also need to add the lifetime in here so it's actually an impl Future that refers to the lifetime of self. That's where the root of all these problems come in because the problem is if you would desugar that into the general trait as I talked about before, now we have a type that's parameterized by a lifetime that implements a Future parameterized by a lifetime and there's no way currently to parameterize an associated type.

You may have heard people refer to associated type constructors in Rust or maybe generic associated types, this is basically the thing that you're talking about. You can't make line two generic in today's Rust, but due to lifetimes, if you would write any useful trait that would be asynchronous, it needs to know what the lifetime is and so it needs that feature so it doesn't exist yet. This is the core reason of why async function doesn't work in traits. We need more type theory shenanigans and other people who are smarter than me are going to go do all that and I'm just going write async function and trait and be, "It works."

It actually gets even way more complicated than that but I'm not going to get into that stuff. What I'm going to tell you is if you want to use async functions and traits today, you can use this package called async-trait and it implements it for you. You're, "How does it not exist in the language but a library can implement it," basically what happens is you add this async trait thing on to your onto your trait and instead of desugaring it into that thing with the associated type, it gives you a box. It's going to box every single invocation. The downside you pay for using this library is your Futures are a little less efficient because you're allocating every single Future, but the advantage is you get to have traits that have asynchronous functions.

Thank you so much for listening to me ramble about how all these bits and pieces fit together and how all the types work. Hopefully, you understand a little bit more about how async/await is actually generated and how Futures are executed. The slides will be posted up on InfoQ later but I have these three blogposts that are really good and go into the examples that I showed you in more detail and with written words and I really recommend reading them to get additional context on how this stuff works.


See more presentations with transcripts


Recorded at:

Dec 09, 2019