InfoQ Homepage Presentations WebAssembly: Open to Interpretation

WebAssembly: Open to Interpretation

View Presentation

Speed:

47:13

Summary

Rob Pilling discusses the foundational corners of WebAssembly.

Bio

Rob Pilling is lead developer at Scott Logic, a multifaceted UK-based software consultancy. He maintains an active interest in tech developments, having contributed to projects such as Rust and his favourite text editor, Vim.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Pilling: I'm Robert Pilling. I'll be doing a talk on WebAssembly. I work for Scott Logic where we deal with things like financial software or government work, and general bespoke consulting. I've dabbled in trading systems, like web apps backed by the cloud, and even done a few years of mobile development as well. In my spare time, I thinker with compilers, explore WebAssembly. I run an internal Rust meetup, where we are currently mob programming our way through writing a trade matching web server. This is great for knowledge sharing and discussion, which usually ends up going down loose chips, and rabbit holes, and irons out some creases in knowledge around language.

Outline

First, I'll be giving a high-level overview of WebAssembly. Then we'll get into the meat of things. We're going to write an interpreter for WebAssembly. To do this, we'll have a quick look at compilers, cover the three main parts of our interpreter: the stack, memory, and control flow. By the end of the talk, I hope to have enough of an interpreter cobbled together that we can do a little bit of math with it. Watch this space.

Why WebAssembly?

Why WebAssembly? I got a lot of people saying that the browser is a pretty specific place to focus on. WebAssembly has applications elsewhere. As you may have heard mentioned, server-side compute, Lambdas or functions as a service, we can use WebAssembly for that, which allows us to nicely encapsulate our code. We can also sandbox local code with it. You've probably heard of the supply chain attacks. Things like Node modules, if that was WebAssembly, it'd be perhaps a little bit easier to sandbox what we have at the moment, which is JavaScript in the browser, so that's sandboxed untrusted code. It's difficult to make it go quickly. We need a lot of infrastructure for that. WebAssembly makes this a little bit more trivial. Then we move on to things like micro-frontends in the web. At the moment, if you want to have different apps in your web program, these are all isolated from each other via things like web iframes. They talk using postMessage, which means all of the messages between these different micro-frontends need to be serialized, and it can be quite slow. WebAssembly allows us to run different apps all within the same process, so it's a lot faster. We can also target various different machine architectures with the same WebAssembly modules, which is really nice. LLVM is one of the tools that powers this. It's a code generator, so it can generate WebAssembly for Rust, Swift, Objective-C, and C++, amongst other languages. That's what allows us to have such a wide range of input languages. That's WebAssembly.

Compilers

We've covered that, so let's move on to compilers. WebAssembly works a lot like LLVM, so we can use it as an intermediate representation of our code. This is great. Any of those languages can now run in a browser in a local sandbox, or even say on a Lambda function as a service. We can build all of our code as WebAssembly, and leave the nitty-gritty details to the implementation. Browsers will then execute this or any WebAssembly horse, and we only need to ship a single binary. This is great, if a little confusing. I've thrown around a lot of acronyms, that's all well and good. I find the best way to understand something is to write a bit of a crummy version of it. I've done this with side-scrolling games, HTTP, download these text editors and a C compiler. If WebAssembly had been a thing back when I wrote my C compiler, I'd have probably not gotten nearly as deep into writing things like the code generator that churns out the machine code. WebAssembly allows us to avoid register allocation, stack frame layout, and calling conventions to an extent depending on the source language that we're building from. That's compilers in a pretty high-level view of where WebAssembly comes in.

Stack

Speaking of writing compilers, what's the simplest machine that we can write? A lot of you will have probably written one at some point, perhaps without even realizing it. A regular expression can be thought of as a little machine all on its own. We move it between states by feeding it characters, and we can take a peek at that state when we're done, so once we've had its last character. If it happens to be in the accept states, or here if it's in the rightmost one for the d character, then we say that the regular expression has matched. This is interesting. We can do this and yet the machine has no variables and no memory, which actually limits what it can do. What's the next step? How can we improve this? While we can do a lot with regular expressions, believe me, we can still perform seemingly simple tasks. It's impossible, for example, to write a regular expression to match balanced brackets, because it has no notion of counting or remembering.

This brings me to an interview question that a lot of you might have stacked up against. How can we tell if a string contains balanced brackets? Here's the solution and the first bit of live coding. You might have stumbled across this or even written something similar yourself. We can use a stack. Here we have two examples. This first one contains balanced brackets, and the second one doesn't. We've got this open parenthesis here, and then an angle bracket. Then I've just wrapped this in a little shell function, which calls our interview question, balanced, which tells us whether the brackets are balanced or not. This function just has a stack. Then for each character, if it's an open bracket, then we push a closed bracket onto the stack, similar here. Then, otherwise, we expect to pop off from the stack, the most recent one that we saw. If that's not what we expected, then we return false. Otherwise, we've got to the end of the string if we've exhausted the stack, then that means that we've got some balanced brackets. If I just run this, we'll notice the first example here is true and the second one is false, which is exactly what we expect. This is pretty useful. We've got a way to count now. We can look at nested brackets the same, but actually, we can do a lot more than that. We can now write passes for a large amount of programming languages, in fact, all just with using a stack. Surely, a stack, it seems easy. Is it that easy to write? Maybe we could have a look.

Let's take a look at this stack.rs that we have here. Now, first stack, we need a way to create it. We're going to need a way to push something onto the stack pop. It would be useful if we knew how to tell if the stack is empty as well. For this stack, we'll need some backing storage, so let's just use a vector. It's a pretty ubiquitous datatype. We can have a go at implementing these functions. The new function, we just want to return a stack, and we'll just initialize the vector there. Push, I suppose we want to take some kind of t. Then we'll just follow that onto the vector, so we'll just say push that onto the end of the vector. Pop is going to be similar, except this time instead of taking a t, we'll return a t. Again, we'll do a similar thing, except instead of pushing, we just pop from the vector. This gives us an option, so we'll just have to unwrap that and just assume that it will always succeed just to make things a bit simpler down the line. Finally, we've got is empty. For this we'll just want to return a Boolean. Again, we can almost cheat and just say if the vector is empty, then we're empty. I've got a test down here, that just pushes and pops onto the stack. We start out with a stack 123, it shouldn't be empty, we push 4, and then when we pop, we should get the 4 back. Then we'll get a 3, a 2, the 1, and then it's finally empty. Let's give that a little run. We'll compile it first. We just compile it with tests to check that that all works. Once that's built, if we run it, we should see our tests all succeeding. This is done. We've written a simple stack.

While stacks look simple, they're actually pretty magical, because they allow us to do one of the most exciting things in the world, math. One of the early things we learned in school is that we can't answer this equation by saying 12 minus 5, and then we add 2 to that, and then we times the whole thing by 3. It doesn't work like that. We've got to group the operations which we can arrange in a little tree. It's a bit like this. The tree helps us do things in order. We can perform a traversal where we walk along the tree to calculate the result. In fact, we can just demonstrate a little executor to do just this for us. Up here, I've just got some code, and we'll just ignore the topic for now, and we'll take a look at this bit. We've got an operator, and that operator is a plus. Then, what are we adding together? We're adding together the result of this operator, which is minus a 12 and a 5, and then also this operator, which is times and a 2 and a 3, which corresponds to part of the tree.

If we look at what we do with this code, we just say .eval on it. How does that work? If we look up here, the eval, an operator will then have a left and a right-hand side. It'll leave out both of those. Then depending on what operator we are, we'll move on to add or subtract or multiply. We'll just do that recursively. This can't go on forever. At some point, we hit a leaf like the 12 here. I suppose we need some value. To evaluate a value, it is just the value itself. Let's give that a run and see what it prints out. We get 13, which is actually the right answer, which is a relief. What we have here is called an interpreter. The Python code runs, and the Python code is what cranks and calculates it. We want to do a bit better than that. We don't want Python around to interpret an abstract syntax tree, which is what this bit is called. Let's instead try to generate some code. What we have here is pretty similar, except there's a few crucial changes, this time instead of printing something out, we just emit some code at the end. Our syntax tree is exactly the same. This time, for a value, instead of actually returning it, we don't want to return anything, we want to actually emit some code. Here, we're just going to use a stack familiar. We'll just push whatever our value is onto the stack. Then an operator is going to behave similarly. We'll get the left-hand side to emit itself and then the right-hand side to emit itself. Then we'll pop from the stack into, let's just call these two variables, so we have an l and an r. We'll pop into the r, because the right-hand side was the most recent thing that we emitted. Then we'll pop the next one into the L, which corresponds to the left-hand side. Then, as before, if it's a plus, we do a plus, if it's a minus, we do a minus, and so on. We'll push that result back onto the stack. If we run this, we now don't actually get an answer, we get what looks like a lot of gibberish to do with pushing and popping onto the stack.

This is interesting. We still need to compute our equation. We need some little machine language that helps us out with this. Here's how we might do that. We'll have a machine. I suppose the first thing that we need in this machine is a stack. Let's just add that in. We'll call it a stack of values just because these generic values we'll deal with. Then, let's create one of these machines, so this will be up here. We'll just initialize the stack to an empty stack. We want some way of running some code. We'll just say, m.run, and we want to give it some code in here. Rather than typing out this code by hand, I think we can probably use the code gen from earlier. Let's just tweak this a little so it looks a bit more Rusty. We'll have some instructions. For this constant here, we'll probably say instruction, Const, and then we'll just create an I32 with the value that we've got, like so. Then we'll want to do something similar for up here. We'll get rid of those pops and we'll have that be implicit in how the instruction works. For the add, I suppose we want to say we've got a binary operator, and this binary operator is an add instruction. Then we'll want to do something similar for the times and the subtract, so we just update that, like so. Then if we run this code gen, or if we read from it into here, we've now got our Rust code with some pretty simple operands ready to go.

How do we get the result of this out? We need to see what's at the bottom of the stack, so let's just print that for now. We'll just say, pop from the stack. We've got these instructions, but we don't actually have any implementation in our run method up here. Let's just pop something in here. We say for each instruction, what we're going to do is we'll effectively just switch on that instruction. Maybe we've got this Const instruction, which we'll just use to push onto the stack. Or maybe we've got this binary operation instruction, in which case, we'll want to go a little bit deeper and just decide on which binary operation it is. We've got either add, sub, or multiply, so we'll just pop those in here. For an add, we'll want to add some left-hand side to some right-hand side. Then similarly for the subtract and multiply. We just update those. That will give us our results. We'll just call that x. We want to push that onto the stack at the end a little bit like this. Where do we get those from? Similarly, to before, we want to pop the right-hand side first and then left. Let's do that. Then I suppose there might be some other binary operators, we'll just ignore them for now and pop in a panic. Same for other instructions as well. This is our machine. Let's see if it compiles. We've now got a machine and the output is 13. What we've done here is we've taken our syntax tree that we originally had in that Python code. We've generated some instructions for it, and we've now evaluated these in our little Rust machine. What's the use of this? Wouldn't it be handy if we could change the numbers, for example, and make our machine a bit more like a CPU, and tack on some features?

Memory

We've covered the stack and generated some stack code using pushes and pops to evaluate an equation. Stacks are great, but we can only look at the top value. Memory is a bit more powerful. Let's take a little look at that. We want our machine to be able to access a big flat array of bytes memory. Let's see what changes we need to make to our machine for that. First of all, we want some memory, actually within our machine, and we'll just say it's a vector of bytes. That's easy enough. Then we want to initialize that down here. Again, we can just say, create an array, we'll just go with 64k of memory. We'll just pop that into a vector. Now let's have a look at this 12 here, let's maybe make that living memory instead. What we'll do is we'll need to have an address for it. We'll just call it a, and let's have it live at address 42, why not? What we'll want to do is rather than having the 12 hardcoded here, we want to do a load from this address. We'll need some load instruction. Our 12 is gone, so we'll need to initialize that as well. Let's have some store on our machine. We want to store this 12 as a value, so we could say it's still an I32. We'll just pass that over to the store method of our machine. We now load in the 12 indirectly from memory. At the end of this, we want to have some way of storing the result, so let's have a store instruction. This store instruction will take this whole value here, which is calculated and it will store it at an address. We've got the value first, which is this part, and we also need an address. We'll have that as address r, which we'll call our result. Let's just pop that in memory address 12 for instance. That means now rather than popping the stack, we just load from that address, so we'll just say, load from address r, and there's a little bit of Rust casting going on. That should give us our result. We've changed our bytecode here. We now just need to implement these instructions, so a load in the store. Let's just move up here, and first of all let's do a load. If we're loading, I suppose we've got an address to load from, which we'll just pop off the stack. For this, we will then get our value. We'll just say self.load from that address, and we'll do a bit of a conversion there. Then we'll push that back onto our stack.

Store is going to be pretty similar, except this time we're going to have two operands instead of one. First of all, have the value. We'll say let value equals, like that, and then we'll get the address. Then we want to say self.store. Again, we will delegate to this store function. We'll store to a particular address, this value. Now we have just these two functions to implement. Let's just go to there. We'll do the load one first. This one is perhaps a little bit simpler in implementation. We'll just take an address, and we'll give back a value. There's a little bit of Rust wrangling that we need to do, that I'll explain shortly. We need to decide with this memory, what it looks like. It's just an array of bytes, but we're dealing with 32-bit integers. I suppose we're going to have a set of bytes, and four of them in a 32-bit integer that we'll want to load from memory. We'll just say self.memory, a particular address and address plus 4. We'll take a slice of four bytes of that memory. Then this is where our try_into comes from above, which is where we're just asserting that this particular size here is of size 4, and so it will fit into this array of 4. That'll always succeed. We can just effectively assert that at the end. That's great. We've got our bytes. How do we convert that into a 32-bit integer? We'll just match what most modern machines do. We'll say these bytes are little endian. What that's saying is, if I have the value 123, that's actually stored in memory as 3, 2, and then a 1. There's a few reasons for this. If we have a pointer to the start of it, and we truncate it, we actually just still get a 3, which is what we want, in most cases. We'll create an I32 from those little-endian bytes. That gives us our value. That's done.

Our store function is going to be pretty similar. This time, we're mutating ourselves, obviously, and we're going to want some value to store instead, and we won't return anything. This time, we'll have our bytes and we'll get that from the value itself. Let's just do up here. We'll get the bytes from the value. Then we want to assign that into memory. We've got this slice of 4, and we'll just say, copy from our local slice of bytes here. That's our store and our load. We've made a few decisions on things like endianness of our memory. Moment of truth, let's see if this compiles. It compiles. Does it run? We get exactly the same answer. It may seem like we've done a lot of work there for no benefit, but what we've actually done is added a whole new feature to our machine. It actually turns out, this is how languages like Java work. We've got a little Java program here that I've called eval. This program takes three inputs, it times these two of them together, and then adds the third one on to that. If we compile that, we get a little eval.class file. We can actually dump the bytecode that the Java has generated for us here. If I say javap -c Eval, this is the bytecode. Ignoring the first bit, you'll notice we've got our eval function, and we have some loads, multiplies, and then returns. What this is doing is exactly the same. It's a stack machine that it's pushing and popping from. I think we've made some good progress.

Goto

We are missing one major feature from our machine. Java has if statements and loops, which we don't have yet. Now, how would we implement an if? We can do some test, I guess we need. Then we need like a goto, which a lot of people tend to find morally offensive. There is a way around it. What we could do is we could have some branch instruction, which will behave as like our test or check. Then we need to decide where to branch to. We could say there's two choices. We might either want to restart a series of instructions to do them again, or we might want to skip ahead and just avoid doing a collection of instructions. That gives us two things, a loop and a block. That will look a little bit like this. As any game developer would probably point out, you can do anything with enough if statements and while loops. This is great, we can now do anything. Now, our branch instruction here, this br_if has an index in which block or loop we want to escape from, and then we either go to the start of the loop or the end if it's a block, which is what the arrows demonstrate here. I've also made up some call instructions here just for demonstration. You notice on the right, we've got this plain br instruction, which is a branch that will always be followed.

Why don't we have a go at implementing that in our machine. We'll get rid of these math evaluations from before. What we're interested in this time is adding up all of the numbers between 1 and 100. Here's our pseudocode, and that will look a little bit like this, so we just pop this into our run. Just bound some brackets. Then I'll talk you through it. We'll start with a t, t for total, is 0, and then for each number in 1 to 100, we'll just add that number to i. There's faster ways or better ways of doing this, but I think this is a good test of control flow. We need to decide where to store our t, where to store our i, and we'll have our limit variable. As before, we'll just initialize these two variables. Total will be 0, and i will be 1. What we've got here is we now have this loop instruction, which then contains more instructions itself. I'll just talk through that. We've got the loop block. Then if we have a look here, we'll load i, we'll load t, and then we'll add them together. Then we'll want to store that. We'll take the value and the address t, and we'll store. That's adding i to t. Similarly, below, we then take i, we load it. We then take 1, add it to i, and then we take i's address and we store it. We've added i to t, added 1 to i. Now we want to do a check. Let's load i again. We'll load our limit plus 1, so we'll load 101, and we'll subtract these. What's going on here is we're wanting to see if this subtraction comes to 0, because our test is basically going to say, is register 0? If it's 0, then we'll not take advantage, otherwise we will. If we've taken away 101 from i, and that 0, then that means i must be 101. In that case, we don't want to branch, otherwise, we'll say branch to a depth of 0. The 0 just means that we're going to go to this outermost loop here and continue the loop. Finally, at the end, I suppose we want to load our t and see what our result is.

We've got our bytecode here, and we just need to implement a couple of these new instructions. Let's have a look at that. I suppose the first one that we want to implement is our BrIF, and that's going to have some depth. Now with this, what we want to do is look at a condition that's on the stack, so let's just pop that off the stack first. We'll just say, if it's Boolean, so we'll just do a bit of an into to get that, then we need to return some way of saying break at this depth. Let's just invent an enum for that, let's just go up here and we'll create that. We've got our finish enum, and we can either finish code normally with a done, or I suppose we can break at some depth, we'll just use a u32 for that. We've got that. I suppose we want our unconditional version of this. We'll get rid of the if. We'll now pop the value, and we'll just say break at that depth. Those are our break instructions. Next, we need loop instructions. Let's pop that in here. We've got a loop and the loop contains some instructions. How do we implement a loop? That's easy, we can just pop it inside a loop. What we want to do is we want to run these instructions. Then, depending on how they finish, affects how we finish. If they come to some normal conclusion, then that's fine. We'll just complete ourselves as well. We'll break, and this break will just get us out of this loop. If they come to some kind of a break, then this is where things get a little bit more interesting. I suppose if it's breaking and the depth is 0, then that means it is us, so we're saying we want to branch back up to this loop. We'll just say continue here, but that's only if the depth is 0. Let's just inspect that. If the depth is 0, then we'll continue. Otherwise, the depth is greater than 0, so we want to propagate this break. Let's do that. When we propagate it, we just subtract 1 from the depth to make sure that it nests properly. If we're breaking out of here, then we'll subtract 1, and that will take us up to referencing the next block.

Our block instruction like our loop is going to be pretty similar. Let me just call that up. We'll have a loop here. Only this time if we hit a done, then there's no loop, so there's nothing to break from. A done will actually just do nothing. Let's just bring that back in. If we hit a break, then that means we want to finish executing the code that we're at if depth is 0, otherwise, we want to propagate it. We can just say if the depth is greater than 0, then propagate the break. Let's see if that compiles. We've got a few things here. Our actual function needs to return some finish, so we'll just add that in. The normal way of finishing will be just the normal done. Let's give that a while. When I run this now, what we're going back to is our code from before, where we're counting off the numbers between 1 and 100, and that should sum to 5050, which it does. I'm pretty happy with our little machine that we have now. We can take some machine code and we can execute it. It's quite laborious writing all of this machine code to run on our CPU, though. We've done a few nerdy examples here. I'm after running something a little bit larger, but I don't want to have to mess around trying to generate all of this bytecode. It'd be handy if someone could do this for us.

Mandelbrot

It just so happens that my machine that we have here is WebAssembly. Luckily for us, my colleague, Colin, wrote a WebAssembly compiler which can generate all of these little pushes, pops, consts, and so on for us. Just to clarify that, because it's a lot to take in, we've got a machine that can execute a bunch of instructions that we've made up. We need a machine to actually generate those instructions in the first place, so that's the code generator that gives us the WebAssembly that we feed into our machine, and then eventually we get the output. Colin's talk, what was that all about? It's a good overview for how to put together a compiler. Colin invented a language called chasm as shown here, and using that he called it a program which can generate an image of a Mandelbrot. At the end of his talk, Colin demonstrated this code by running it in a browser, which then displayed this Mandelbrot image. We're going to take the place of the browser. How did I go about this? I went to the website that Colin's made, and what I did was, you can either interpret this chasm code, or you can run it as a compiler. I ran that partway through, and I paused it before I was about to execute it, and I pinched all of the WebAssembly code from Colin's chasm that had been generated. This gave me this mandelbrot.wasm. We'll just have a look at that.

Here's the file. That's a binary format, so we can't actually see what's in it. There's a few tools that we can use that will do that for us. If I run this wasm2wat tool, that will give us the WebAssembly in a text format, and it stays nicely indented, so we can have a rough idea of what's going on. There's a few loops here, and so on. Let's get that into our machine. We'll just get rid of this bit to begin with. Now, we run the same thing, wasm2wat, and we'll run it on the Mandelbrot. This is great, so we've got it into our machine. The problem is, we can't execute this. This isn't Rust code, but we can tidy it up a little bit. We just get rid of some of the things that we don't need. Then we need to just convert this into the bytecode that we had before. It's a line-to-line conversion. It just so happens, I've got a throwback to earlier, so talking of regular expressions, they might not be powerful, but they're really useful. Here I've got some that can just convert all of these WebAssembly code into Rust for us. Let's just give that a go. Do all the substitutions and here we go. We had before the WebAssembly and now we've got it into Rust. That makes our lives a lot easier. I'll just sort out some brackets like this. Let's check, we've got everything right. That's our WebAssembly code.

There's a few extra instructions here you might have spotted. First, we've got a few extra operators. We've got this BinOp, this less than signed operator, and a couple of others. We just bring those into our machine up here. Always wanted to say this, here's one I made earlier. Here are binary operators, so we've got divide and less than, so we just pop them in there. Then we've also got this new unary operator. What this does is rather than the binary one which takes two, this just takes a single value. We've got an equal to 0 and a truncation one, so we'll just deal with that. It's the same story, you pop something off the stack, do something with it, and then push the result back onto the stack.

If we go back down, you also notice there's all these locals, which we can get and set now. That's how Colin implements these variables, so x and y. A local is like a fixed bit of memory. We can just implement that similar to our memory up here. Again, we're just putting that in. We'll just treat the locals as a bit of a dictionary. We'll get the indexed local, and we'll push it onto the stack for a local get. For a set, we'll just insert that into our dictionary. We need to create this dictionary. Let's just do this here. Let's use a hash map. It's an index that maps on to a value, so we just need to bring that in as well. I suppose we'll need to pass that around wherever we call run. We'll just say, instructions and locals like that, and right at the end here as well. We've got our local passed around, and we'll just create them up here. We can just assume that Colin's code will always initialize a local before using it, so we don't need to worry about any of that. There's one other instruction that we need to look at. This is a store instruction, actually a store8, which is what Colin used for generating the bitmap. We want to just take the bottom 8 bits of a value and store them. Let's just implement that up here near the store instruction. Similarly, to the store, this one also pops a value, pops an address. Then we take the bytes and then we'll just take the bottom byte, so the zeroeth byte and store that at that address. Those are the extra instructions now added.

There's one other thing though, we're generating a Mandelbrot, but at the moment, we've just been popping a single value from the stack. How can we view this? The way that Colin's code works is it generates the Mandelbrot in memory, so it's like a 100 by 100 image. If we have some bitmap, I suppose we want to have like an image. We'll just say, it's 100 by 100 image. Then we need to populate this image from the memory that's in our machine. Then once we've done that, we'll just print it out. Actually, let's take a look at it. How do you do that? We can just go over all of the coordinates. For each coordinate, we'll get the eighth byte of memory. We'll just say, the image at that coordinate is that byte. This will take us left to right, top to bottom of the image. Let's see if that works. We've got a few imports missing, and I just need to add. Let's add one. We need to bring in our image and pass a reference to our locals here as well. There's a reference to our locals, and our image is in this tty_bitmap module. We just bring that in and use the image from it. There's some unused variable, that doesn't matter. This is great. When we run this, it should generate a Mandelbrot. This is really the white-knuckle ride, because it's quite a bit of code and we have no idea whether this will actually work correctly, because I've just cobbled this together. Here's the moment of truth. Hopefully, it will give us a Mandelbrot image at the end. There, the program's done. What do we have here? It's the same Mandelbrot image from Colin's, rotated a little because there's differences in where the y axis starts. There it is, Colin's code actually running on our little machine. I'm really pleased with that. I'm glad that worked.

What's going on here? You could say this is a chasm program running in an awful interpreter. Do not use this in production. The software for it is chasm code, or WebAssembly. You can call it whichever you like, both is true. To quote Obi-Wan, it depends on your point of view. It's pretty amazing. We've taken this WebAssembly thing, and we've run it nowhere near a browser just from something that we've knocked together in the past 30 minutes, and we've still got the same result. We can peek behind the scenes and decide actually how we want this WebAssembly code to run. We can not only run WebAssembly code, but anything that compiles to WebAssembly. We can run C code, C++, Swift, or even perhaps Rust code as WebAssembly, then inside our Rust interpreter, which I think is pretty cool.

Recap

We've managed to go all the way from having a simple stack up to running pretty arbitrary code on our little machine. I think this is amazing. It really is turtles all the way down.

Questions and Answers

Eberhardt: We've got a few observations. One of them is Mandelbrot fractal! It's my favorite fractal. I don't know about you.

Pilling: Yes, mine too. It would have been cool if I could have added a bit of a zoom and just go into the Mandelbrot and see if it's turtles all the way down there. It's got to stop eventually.

Eberhardt: I've heard that it doesn't. I haven't got the computing power at my disposal.

Pilling: Same. Especially not if you're running that interpreter that I wrote as well.

I quite liked that about WebAssembly. Once you can grasp the main instructions that you have, so stores, loads, and so on, you can tell what everything's going to do. It like opens up the rest of WebAssembly for you really. It's just a very accessible language.

Eberhardt: Yes, it's super simple. Have you done anything with other assembly languages before then?

Pilling: Yes. I've done a little bit of tinkering with x86. That's the central language that most of our machines run on. Unless you're on one of the new M1 Macs or a phone, of course, which is Arm. In my spare time, I've written a C compiler, which targets those. That's quite a bit more difficult than WebAssembly. There's null safety there. If you're writing C or anything like that, you get the whole jungle, as well as the banana and the gorilla that's holding on to it.

Eberhardt: How does the instruction set actually compare? Because I know the WebAssembly instruction set quite well. I couldn't tell you all the instructions, but I know there are about 40 or 50, and they're all really quite simple. In x86, how does it compare? Is it a larger instruction set? Are the instructions in some cases a little bit more complicated in terms of the operation you said before?

Pilling: Yes. There's a lot of routes that we could talk about there. When you're asked about the size of the instruction set, so for x86, there's something ridiculous, like 50,000 instructions, and this is including all of the Single Instruction, Multiple Data like super scale.

Eberhardt: Yes, SIMD massively multiplies.

Pilling: Yes, exactly. Then you've got a lot of built-in instructions that aren't really used anymore, but are just there for backwards compatibility, so running it in 16-bit mode, or you've got things like figuring out sine, so trigonometric functions that no one uses because they can actually be implemented faster in software.

Eberhardt: There are trig functions on the CPU?

Pilling: Yes, exactly. I think there's several cryptographic instructions on the CPU as well. There was a bit of a conspiracy about that where the Linux kernel was going to use some of these and some people were like, "No, we can't use that. We don't know the source of randomness. We don't know if it's cryptographically sound," and so on. Yes, the instruction set has really swelled over the years and there's quite a lot of scope creep. If you take it back to its basics, like the original 8086 chip, you can draw a lot of parallels with WebAssembly. You've got the basic add, multiply, and so on. Then you've got your loads and stores, your calls and returns and whatnot. I always get the feeling with WebAssembly, it's a lot safer, and it's more structured. Loops, as you saw in the talk, it's already demarc'd, what exactly will be looped over. Whereas in x86, you just jump and it just so happens to make a loop.

Eberhardt: By jump, you effectively mean jump to a different memory address.

Pilling: Yes, and that's very arbitrary as well.

Eberhardt: Whereas with WebAssembly, it's break to a particular stack depth.

Pilling: Yes, exactly that. Which you would think one would be more powerful than the other, but it turns out, they're both equally as powerful and equivalent in the end, which I also find interesting.

Eberhardt: From a security perspective, break to a stack depth is inherently more secure thank jump to a random address.

Pilling: Yes, exactly, and just as easy to generate code for.

See more presentations with transcripts

Recorded at:

Mar 03, 2023

Rob Pilling

InfoQ Software Architects' Newsletter