InfoQ Homepage Presentations Build Your Own WebAssembly Compiler

Build Your Own WebAssembly Compiler

Bookmarks

View Presentation

Speed:

Download

51:15

Summary

Colin Eberhardt looks at some of the internals of WebAssembly, explores how it works ‘under the hood’, and looks at how to create a (simple) compiler that targets this runtime.

Bio

Colin Eberhardt is the Technology Director at Scott Logic, a UK-based software consultancy where they create complex application for their financial services clients. He is an avid technology enthusiast, spending his evenings contributing to open source projects, writing blog posts and learning as much as he can.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Eberhardt: My name is Colin Eberhardt. I work for a UK-based software consultancy called Scott Logic. In time honored tradition, I'm going to start off by plugging my book. This is, "What Is WebAssembly" by O'Reilly. It arrived in the post just a few weeks back. It looks good from that direction, less good from that direction. I call it a book. It's more of a pamphlet, really. There's my pamphlet.

Why We Need WebAssembly

Why do we need WebAssembly? That can be summed up in this one slide alone. JavaScript these days is a compilation target. It wasn't intended for this purpose. When it was first invented just over 25 years ago, it was intended as a way to add just a little bit of interactivity into what was otherwise quite a static web. However, 25 years from then, we're using it in quite a different way. I think the two key differences are, one, we're writing a lot of JavaScript. Collectively, we're writing tons of JavaScript. The other significant difference is the way that we deploy our JavaScript, the way that the JavaScript finds its way to our browsers has changed incredibly in the last 25 years. 25 years ago, you'd have written a bit of JavaScript in some text editor, and that would be transferred over HTTP, verbatim, and executed within the browser. These days, you're probably using React or TypeScript. You're using bundlers. You're using minifiers. All kinds of clever transformations are taking place before the code is transferred to the browser in this somewhat mangled form. This is why people now consider JavaScript to be very much a compilation target. Our tooling takes our code, transforms it, and compiles it for delivery to the browser. That's ok, isn't it? No. It's really terrible. The reason it's really terrible is because it was never designed for this. JavaScript was not designed to be a compilation target. It's a terrible compilation target.

This slide, summarized from a Mozilla post, gives a rough overview of how the browser consumes and executes JavaScript. Starting on the left-hand side, JavaScript is transferred over HTTP. It's received as a series of characters, which are then first parsed into an abstract syntax tree. From there, it's able to generate a bytecode which is run within an interpreter. At that point, your JavaScript is up and running. Interpreters are typically slower than compiled languages. These days, what happens is the JavaScript engine will observe your code as it is running. It will be able to make certain observations. From that, it will be able to compile your code to allow it to execute faster. For example, it may look at the various types that are being parsed around, determine that these two types are always integers. I'll compile that into a form where they're integers, and I'll execute it more quickly. That compilation process is tiered. There are simple assumptions that can be applied. There are more complex assumptions that can be applied. Safari has four different tiers within its JavaScript engine. What that means is by the time you get to the right-hand side, your JavaScript is running really fast. If you compare it to benchmarks of C++ running native, JavaScript gets pretty close. The problem is it takes a very long time to get there. That is because of the way JavaScript is delivered to the browser.

If we then flip this over, and have a look at what the impact is from a user perspective. When you're developing web applications, you're spending a lot of your time trying to minimize the overall bundle sizes, your image sizes. You care about how long it takes for your JavaScript application to be interactive and usable by the end user. If we look at it from the end user perspective, this is a very simplified timeline running from left to right. There's a time impact to parsing the JavaScript. There's a time impact to compilation, optimization. There's a time impact of re-optimization. I know some of these are interspersed. Finally, it executes at a pretty impressive speed. There's also garbage collection as well. The whole process is really quite inefficient when you look at it.

What Is WebAssembly?

WebAssembly has a number of different definitions. WebAssembly or WASM, is a new portable, size and load-time-efficient format suitable for compilation to the web. In that one sentence, it's addressing some of the core, fundamental problems with JavaScript, and how that is delivered to the web. WebAssembly was designed to solve those problems. If we pick it apart, it's portable. It's a web technology. It has to be portable. It has to run in various different browsers. It also works outside of the browser as well. It's size and load-time-efficient. It was designed to minimize the payload size. It was designed to be fast. Suitable for compilation to the web. This is the key difference. JavaScript was never invented as a compilation target, whereas WebAssembly, from day one, is designed to be a compilation target.

If we compare the timeline of loading and executing JavaScript within the web, at the top, compare it to, roughly the equivalent, with WebAssembly. There is a decoding step. There's a compile and optimize step. It still has to be compiled to the underlying architecture. However, it's much simpler. It's much faster. There's no garbage collection. The goal with WebAssembly is to basically do the same job but a lot faster. That's why I think we need WebAssembly. A few people are getting quite excited about it. This is a quote from Yehuda Katz, who is on the Ember.js team. "JavaScript code is much more expensive, byte by byte, than an image because of the time spent parsing and compiling it." It's possible to parse and compile WASM as fast as it comes over the network, which makes it much more like an image than JavaScript code. It's a game changer. It changes the way that we think about our bundles and our deployment to the web.

Why Create a WebAssembly Compiler?

Why create a WebAssembly compiler? Why am I talking about building a WebAssembly compiler? I actually came up with the idea about 8 months ago. There are a couple of things that sparked my interest in this. The first was the results of the Stack Overflow survey. Every year they survey a significant number of developers, something like 30,000. What I love about this survey is it doesn't just ask people what they're using, it asks them about their sentiments. Here, they publish the most loved, dreaded, and wanted languages. WebAssembly was the fifth most loved language at that point in time. It might have even more love now. I thought that's really interesting because the other 20 languages that appeared in the most loved table were languages that you or I would write as our day job. Yet WebAssembly is a compilation target. How can people love it as a language when it's a compilation target? For me, I thought, clearly, people want to know more about this language and this technology. Maybe I should tell them a little bit more about how it works under the hood. A vast majority of the time, as a user of WebAssembly, you're going to be using a tool chain that emits WebAssembly as the end result. You're not going to delve into WebAssembly. Clearly, we have an interest.

Bucket List

The other reason is I came into software engineering from a physics background. There are certain things within programming I've always wanted to do, so create an open source project. I've got a developer bucket list. Meet Brendan Eich, the guy that invented JavaScript. Met him. Lovely guy. Write an emulator. Create my own language and compiler. I've always heard the terms. I'd heard of abstract syntax trees. I knew a little bit about what tools like Babel is. I'd never had the opportunity to actually sit down and try to create a language and a compiler myself. I thought I'll couple the two together. I'll create my own language. I'll compile to WebAssembly. I'll do a talk about it. This talk is very much a fun, pet project for me.

I'm not a language designer by trade. I didn't intend to and I wouldn't intend to create a first class language that I'd expect any of you to use. My one goal was to create a language that had sufficient capability to do one interesting job, and that is render the Mandelbrot set. This goes back to one of the very first programs I wrote many years ago in Pascal, which was to render the Mandelbrot set. For me, it's always been a fun step-up from a simple Hello World application. My language does just that. It's not a very good language.

A Simple WASM Module

This is a compiler that sits within the browser. The compiler itself is written within JavaScript. The idea is, I have a web page where I can write code. It compiles it into WebAssembly, and executes it within the browser. We'll start with the simplest output possible. We'll start by looking at how you create the most basic WebAssembly module. This is it in just a few lines of code. The simplest WebAssembly module is composed of just two pieces. One of them is the magic module header. If you're good with your ASCII, 61, 73, 6d is ASM. That identifies it as a WASM module. The next part is the version which will allow them in future, potentially, to make backwards incompatible changes. My emitter, which emits my WASM module, simply concatenates these two arrays together. That's it, my simple WebAssembly module. In the real world, typically, this outputs a file as a .WASM file, and delivered over HTTP. However, here, I'm creating the WebAssembly module dynamically within the browser.

WASM Module Execution

How do we execute it? Here, I'm executing my emitter to give me my WebAssembly module. Then I'm using the WebAssembly APIs to instantiate an instance of my WebAssembly module. One important thing to learn here is there's a relationship between WebAssembly and JavaScript. JavaScript is termed the host for WebAssembly. WebAssembly always requires some form of host. The reason being that WebAssembly in itself doesn't have any I/O capabilities. In order to interact in any way it needs to be instantiated and work alongside a host. This particular WebAssembly module does absolutely nothing. Let's have a look at what we need to do to make it do something a little bit more interesting.

An 'add' Function

The next step towards creating my compiler is to create a very simple add function. Here, I'm showing you in the top left-hand corner, WebAssembly whilst it's a binary format, you can also represent it in a text format called WAT, WebAssembly Text Format. It's a lovely name. I was hoping they'd call it WTF, WebAssembly Text Format. They didn't. What you can see here is an add function. It takes two parameters, which are of type float 32, and returns a result. The body of this function has three operations, get_local 0, get_local 1. That retrieves the two parameters that have been parsed to the function. Then f32.add, that's an add operation. WebAssembly has a relatively simple instruction set. If you've done any assembly language programming, if you've done 6502, or Pick programming, or anything like that, it has that feel to it. It's at that level. WebAssembly only has four numeric types. It has two floating point types and two integer types. That's it. You can construct more complex types. Also, WebAssembly is a stack based language. What we see here for our add operation, is that we first get the local at the zero index, get the local at the first index, and those are loaded onto the stack. The add operation pops those two values from the stack, adds them together, pushes the result back onto the stack. The function return is the value that remains on the stack. WebAssembly has no built-in I/O. The final line here is a function, export. What that does is tell the host, which is, in this case, JavaScript, that it is able to execute that function directly.

WebAssembly as a binary format is composed of multiple sections. It's a really easy format to inspect and decode. We've already seen the header and the version. Following that, it has a type section, which is all the type information for the various functions. It expresses the imports, and the function types, and so on. I'm only showing you this because most of the code that I'm going to show you just has a few bits of ASCII code. This is, roughly speaking, how it's all assembled together. Each of the sections has a numeric identifier. As the WebAssembly is decoded, it can determine which section it is. It's a very simple format. There are some quite nice tools you can use online where you can load a WASM module, and it lays it out like this.

Encoding the 'add' Function

How do I encode my add function using JavaScript, using my emitter? We'll start with the code block. This takes the two get_local operations and the f32.add operation, and encodes them as an array. Taking it step by step, I have an enumeration called opcodes, which is all of the various opcodes. All the instructions that WebAssembly supports. After my first opcode of get_local, the second byte is the index that's parsed to get_local. This unsigned LEB128, is a standard variable length encoding. You can look it up on Wikipedia. This is how I encode get_local 0, get_local 1, f32.add. The output code is basically an array of bytes. This is then bundled up into a function. It's a very simple encoding. I'm not showing you the code for encode vector. It's a little more than a concatenation of all the bytes prefixed by the overall length. That takes my code and encodes it within a function body. Finally, at the bottom, I bundle that up into a code section. There's quite a bit of scaffolding code going on here. That scaffolding is not very interesting. There's a GitHub project, and you can dig around. It's not interesting. It's really simple.

Now that I've constructed that, I can instantiate an instance of my WebAssembly module. This time, I can actually invoke the exported functions. Here, I'm doing a console.log. I'm invoking that function, which adds two numbers together, 5 and 6, outputting the value 11. This time I've constructed a WebAssembly module that does something marginally useful. If you were to output that as a WASM module, it would look, roughly speaking, as you see at the bottom there. The part I've outlined in white, that's the get_local 0, get_local 1, add operation. That's the interesting part of the code. That is how to use JavaScript to create a fairly simple WebAssembly module that adds a couple of numbers together.

Building a Compiler

My next step is to look at how I construct a compiler to do that instead. What I want to do is invent a language and use a compiler to create that output rather than hard coding the function body. I'm going to make sure that we all have a clear understanding of the terminology here. There are various component parts to my simple language. My language is comprised of an array of statements, and there are different statement types. You can see a variable declaration statement here, which defines the variable b. It looks like any familiar programming language. There are variable assignment statements. This time I take a pre-existing variable and assign a new value to it. While statements. Statements can have other statements nested within them. Other component parts are expressions. The key difference between statements and expressions is expressions return a value, whereas statements do not return values. That's pretty much the only difference. Also, here, you can see that expressions can have a tree-like structure, again, returning a single value as the result. Those are the basic building blocks of my language.

Then if I look at the basic building blocks of my compiler, it's comprised of three component parts. You've probably heard these terms before. I've heard them but I've not had the chance to leverage them. The code is first analyzed by my tokenizer, which outputs an array of tokens. The tokens are then parsed into an abstract syntax tree. This is then fed into my emitter. My emitter outputs the WASM, the WebAssembly module. We'll start delving into those component parts. Hopefully, those terms will make a bit more sense as you see them.

Chasm v0.1

My language is called chasm. The way I developed my language was to do it in an iterative fashion. My first version, Version 0.1 towards my goal of rendering a Mandelbrot set. All it did was it prints output. It just has print statements. It's a pretty dumb language. It gives me the opportunity to describe the various component parts.

The Tokenizer

The goal of the tokenizer is to take your application code and to output an array of tokens. Let's have a look at how that works. I'm not going to show you the source code. It's actually easier to dynamically show you how the source code works. The source code itself is only about 20 lines of code. The way my tokenizer works is it has a collection of patterns which are encoded as regular expressions. It iterates over the input, advancing the cursor forwards. At each point, it matches my regular expressions. If a regular expression matches, it may or it may not push a token to the output.

Let's have a look at how this works. The input at the bottom is print 23.1. At the first location, my whitespace pattern matches. Whitespace in my language is meaningless, so no tokens are personally output. At the next point, my keyword pattern matches. This keyword pattern matches either print or var. In this case, it pushes the token to the output. That's an interesting part of the code. Also, you'll note that the tokenizer also outputs the index. This is useful for debugging later on. When I have syntax errors in my code, I can actually highlight where that problem occurred because my tokenizer stores the location of the tokens. Next up, we advance to the ends of that match. Again, we hit whitespace, which has no interest, no meaning. Finally, we match another pattern which matches numeric literals. If any of you are any good with regular expressions, you'll know that's doing a pretty bad job of matching numerics. I wanted to go for a simple one that didn't cover the slide in a mess. That's the output. What have we achieved here? One key thing is we've removed whitespace. That's because, in my language, it doesn't have any meaning. It has no semantics. That's not true of other languages. In my case, that's the way I designed it. It also provides a very basic validation of syntax. Just because my input can be tokenized doesn't necessarily mean it can execute properly, but it does catch a certain class of errors.

The Parser

Now that we've got an array of tokens, we want to parse that to the parser and output an abstract syntax tree. This is my parser. It's a little bit more complicated, but what I'm going to do is call out the key parts. Look at the parts that I'm highlighting, and hopefully, you'll understand it. Again, the parser does the same thing. It advances a cursor through its input. It always maintains a pointer to the current token. At this point in time, the current token is the first token. I have a function called eatToken, which just advances to the next location. Every parser I've looked at seems to eat tokens. I have no idea why. They must be very hungry. Next up, my parser expects my inputs to be arranged as an array of statements. What this is doing is it's advancing from one token to the next. It's expecting at each point for there to be a statement. Here, what it's doing is it's parsing one statement followed by another, and outputting that as my abstract syntax tree.

Let's have a look at the statement parser. What this does, initially, is it checks the current token type. In this case, it always expects keywords. In my very simple language, all I'm expecting at the top level is a bunch of print statements. One thing to point out here is I'm not showing any of the failure paths. The failure paths are, if none of this matches, throw an error. Here, we are matching on the token type. Next, we match the value. At the moment, I only have print statements. As the vocabulary of my language expands, this switch case will expand accordingly. We're eating a token, which advances us to the next token, which is the number that I want my print statement to output. In this instance, we're now parsing an expression. Each print statement has a corresponding expression. Here's my expression parser. The first thing it's doing is matching the token type. At this point in the evolution of my language, the only expressions I support are numeric literals. Basically, numbers on their own. The only thing it does is it takes the string and converts it into a JavaScript number, and eats a token. We're at the end of the piece of code. The token input is on the left-hand side. It's an array of tokens. The abstract syntax tree, which has started to have that nested structure that you get with abstract syntax trees, is on the right-hand side.

The Emitter

The emitter takes the abstract syntax tree as an input, and does exactly the same thing. It iterates over the array of nodes in my abstract syntax tree. What it does is it switches on the statement type. At the moment, I only support print statements. When it encounters a print statement, the first thing it does is it emits the expression associated with the print statement. WebAssembly is a stack machine. Operations expect the values that they operate upon to already be present on the stack. If I'm printing a particular number, I have to load the stack with that number first. Then I call my expression emitter. Very simple. I'm only supporting numeric literals. What I'm doing is I'm using the f32 const WebAssembly instruction, which pushes a constant to the stack. Here, it's using IEEE 754, everyone's favorite floating point encoding. Finally, the print statement itself is a call operation. This is calling a function at index 0. WebAssembly modules can export functions. They can also import functions. When I set up my WebAssembly module, I'm importing console.log, which is what allows me to print a value.

Demo

Let's see if this actually works. Justin, name a number.

Justin: 7-and-a-half.

Eberhardt: 7-and-a-half. When you point to a tester, they always go for a minus number straight away. Let's have a look. Let's run the compiler. As we execute it, it outputs 7-and-a-half. Let's add another one, 35. That's excellent. I was very excited when that actually worked.

Recap

The input at the top is print 42. I could have cued you up with that number. I was hoping you'd say 42. This is then fed into my tokenizer, which creates the array of tokens. The tokens are parsed into the parser, which creates the abstract syntax tree. The emitter takes the abstract syntax tree and emits the WASM module. These are the interesting parts of the WASM module output, not all the junk either side.

WebAssembly modules don't have any built-in I/O capabilities. The WebAssembly virtual machine has a very close working relationship with the JavaScript host. The program memory is able to push and pop values to the stack. You can't access the stack from JavaScript host. It's got a very strong isolation model as well. Separate WebAssembly modules do not share any memory. They don't share any stack. It's the security module you'd expect on the web. What they can do is they can import and export functions.

Chasm v0.2 - Expressions

The next step in my language is I want to add some simple expression support so I can do some basic math. Let's have a look at the modifications I need to apply to the tokenizer, the parser, and emitter to achieve that. I'm not going to go through all the regular expressions again. For my tokenizer, all I had to do was add a couple of other regular expressions and I was able to output parenthesis tokens and operators. A couple more lines of code and I'm done with the tokenizer. My parser, I don't have to change the statement parsing logic. All I have to do is add a little bit more to my expression parser, so this time, allow me to code with parentheses. What this does is after eating the token, the first token is the parentheses. That doesn't convey any semantics. I parse the left-hand expression recursively. I store the operator. Then I parse the right-hand expression, recursively, to allow me to create expression trees. For example, this print statement results in that abstract syntax tree. Here, the tree-like structure is starting to emerge. Again, a few simple modifications to my parser.

Finally, the emitter. If you recall my previous expression, in my previous emitter, I only understood numeric literals. That's the code you can see in the middle. What I want is for it to now understand the tree-like structure. What I did was I added a traverse function that walks the abstract syntax tree. There are various different tree walking algorithms. This is a depth-first post-order traversal. What this does is it visits the left-hand node, then the right-hand node, then the parent. This reflects the stack-like nature of WebAssembly. You'd have the left thing and the right thing, and the bit at the top. This is a tree walker. Then when it encounters the operator, all it has to do is output the WebAssembly instruction which relates to that particular operator. Just a few lines of code change.

Demo

Let's do some math. Has anyone got any favorite math? Let's try to add some numbers together. That works. Let's do some tree-like stuff. That works too.

Participant 1: 22 divided by 7?

Eberhardt: Can I do 22 divided by 7? Yes. I see what you did there. That was pretty cool.

What I found quite interesting about that is my first iteration of my language, my version 0.1, I had to do quite a lot of work to put the basic building blocks in place to support simple print statements. I put in a lot of effort to create something that did very little. Because of those foundations, because of my tokenizer, my parser, and the emitter, I was then able to expand my language to do something actually quite useful by just adding two or three lines of code to my tokenizer, a few lines of code to my parser, and just a few lines of code to my emitter. The structure, the foundation I had in place is really starting to work for me.

Chasm v0.3 - Variables and While Loops

Let's have a look at how you would add some slightly more advanced programming constructs, variables and while loops. This time, I'm not going to visit all the component parts. I'm going to talk at a slightly higher level. I hope you understand the basic building blocks, the basic foundation of what's supporting this.

Variables

Variables are really quite straightforward. Let's take this naughty program var f = 23, print f. What does that turn into with WebAssembly? WebAssembly functions have the concept of local variables. This is a function that has a single local variable that is of type f32. What we're doing within the body of this function is we're pushing the constant 23 onto the stack. Then using the set_local operation, to set the local value at the 0 index with the value that's currently on the stack. That's how you work with variables within WebAssembly: set_local and get_local. I'm not going to go through the underlying code for the tokenizer. That's a trivial change. The parser. That's a trivial change. With the emitter, all I've got to do is keep track of the names of the various variables and the index at which they reside. That's it. It's really simple to add variables.

While Loops

While loops are really quite simple based on these building blocks. The equivalent of a while, endwhile in my naughty language, in WebAssembly is a block with a nested loop. This is one of the interesting things about WebAssembly. Whilst it is a relatively low-level programming language, it's got an assembly language style feel to it. It actually has a few more high-level programming constructs. It has functions. It has loops. It has roughly the equivalent to a switch case. It has some slightly more high-level concepts than you would typically imagine. This is how a while loop works. We have a block with a nested loop. We evaluate the loop condition which leaves the value of the condition on my stack, i32.eqz. Test whether the value on the stack equals 0. The br_if, break if, if the value on the stack is 0, it's false if it breaks, or the other way around. Either way, it tests the value at the top of the stack, and based on the value, it will branch. How the branch works in WebAssembly is it branches to a particular depth. We have a branch to depth 1 which breaks out of the loop, or we have a branch to depth 0 which repeats the loop. We've got nested statements. The loop condition is itself an expression. We already have those component parts within the emitter. I'm not going to show you all the code, but really, it was a small addition.

Demo

Let's see if I can get that to work. Let's create a variable, while, let's have, I. With my programming language, I always need to add brackets. What you could do at the parser level is you could employ BIDMAS or BODMAS, so that you don't actually need that. You can actually implement that within the parser and not change the emitter. Just haven't gotten around to doing that. Let's do print I, I = I +1. We now have while loops and print statements. We're starting to get there. We're starting to get the basic mathematics required, the looping constructs to be able to render the Mandelbrot set.

Chasm v1.0 - Setpixel

That takes me up to the final iteration, my MVP, my Version 1 release that allows me to render the Mandelbrot. The one interesting addition here is that it requires a way to render effectively, a bitmap image. What I could do is I could export a function that basically instructs my JavaScript code to render pixels on my behalf. If I had a 100 by 100 image, that would require 10,000 invocations. There is a performance impact to invoking functions across the WebAssembly JavaScript boundary. No matter which direction you execute in, there will be a performance overhead. That would be a pretty inefficient way of doing it. When I showed this diagram before, I mentioned, the only way you can perform I/O with WebAssembly is via imported and exported functions. That's actually not quite the whole truth. There's another very interesting way that you can interoperate, and you can transfer values between WebAssembly and the host. That's via linear memory. Every WebAssembly module has the option of allocating a contiguous block of memory. It can use its i32 load and store, f32 load and store operations to store values within memory. This is how if, for example, you're using Rust, or you're using C#, and you're parsing classes, structs, and objects around, what's actually happening is these are being encoded into linear memory. It's a little bit like your heap. Your JavaScript host as well as exporting and importing functions can have direct access to linear memory as well. That's exposed to JavaScript as an array buffer. You can use that as a way of exchanging larger, more structured pieces of memory and values and variables. If, for example, you're a Rust developer, and you return a struct to your JavaScript code, that has to be serialized and deserialized via linear memory. For my purpose, all I need to do is name a location with linear memory. That basically becomes my screen buffer. I can write to that with my WebAssembly code. Then I can pull the data straight into a HTML canvas with my JavaScript code.

I type that really quickly. If I run my compiler, I now get the Mandelbrot set.

Recap

WebAssembly itself is a relatively simple virtual machine. What I mean by simple is I can understand it. If you've ever done any programming with relatively simple processes, you'll look at WebAssembly and you'll go, "That feels familiar." It's a bit like going back to the 8-bit programming. It's really that simple. I've been doing web technology for many years. I came to the conclusion, quite a few years ago, that I'm never really going to understand it because it gets more complicated with each year. I can use React. I know what the API is. I don't know how it works under the hood. That's the same for so many web technologies. Whereas with WebAssembly, it's relatively easy. I give any of you a week and a book, and you'll understand it completely, which is quite a refreshing change for modern web technology. It's also a fun playground. Some of the other things I've done is, I always wanted to write an emulator. I wanted to write a Commodore or Amiga emulator. I found out that was really hard. I set my sights on a Game Boy, which is a bit less hard. I found this machine called CHIP-8, which is a really simple emulator. I had a go at writing that, creating a CHIP-8 emulator that runs on WebAssembly.

I wrote all of this in TypeScript. My day job doesn't give me enough opportunity to use TypeScript. I found out TypeScript was far better than I'd ever imagined. With my abstract syntax tree, for example, there's a string literal that indicates the node type. With TypeScript, I can actually define the shape of the node based on that string literal. I never knew that was possible.

Creating a simple compiler isn't actually that hard. My compiler is not a great compiler, but building up the component parts iteratively was relatively easy. It's a good way to exercise your programming skills. As I went along, I did a very much test-driven development, so lots of unit tests, and so on. Also, there are a number of core programming concepts that emerge time and again, when you're developing this thing. Whether it's the tree traversal, the tree walker with the observer pattern. Loads of really well-known, generic software engineering patterns emerge when doing this thing. Often, when you're working on your day job creating shopping cart systems, you don't really have the opportunity to leverage some of these patterns.

There's a lot of creative energy being poured into WebAssembly at the moment. It's a very new technology. The first official version came out in 2017 or 2018. It's only just been accepted as a W3 standard, actually, which of itself is incredibly important. W3C oversees the three languages of the web: CSS, JavaScript, and HTML. They've been doing that for decades. As of a few months ago, it's the very first time where another language has been added to W3C. They now oversee the four languages of the web. This is the first time in the 25-year history of languages for the web that we've actually got something other than JavaScript. Silverlight and Flash did not count. They went in through the backdoor. WebAssembly goes through the front.

Hopefully, you have been inspired. With WebAssembly being such a new technology, there's a lot of opportunities out there to help build the ecosystem around WebAssembly. Every language you know, whatever language you love, there will be a vibrant WebAssembly community gravitating around how they can leverage WebAssembly. There's lots of amazing stuff going on with Rust, with C#, with Python. You name a language, you'll find a community that's deeply interested in getting that to work on the web.

Finally, I got to tick off another item on my bucket list. It looks almost complete, which is not quite true because I found that languages and compilers become an utter time sink. Once I got that far, I thought, "I want to add strings." WebAssembly only has numeric types. I have to think about allocators or garbage collectors. I want to add arrays, and functions, lambdas, objects. This is a pet project that I think will live on for quite some time. The code is all up in GitHub.

Questions and Answers

Participant 2: I've never worked with web assemblies before, but can you download it? For example, if you have your project, your UI and something, can you compile it to WebAssembly and add this as a reference? Instead of sending a bunch of JavaScript, this client may need a compiled version of it. Is that possible?

Eberhardt: You're wondering how you use it on your everyday JavaScript project. JavaScript is a language that is fairly challenging to compile to WebAssembly, for two reasons. Firstly, it doesn't have static typing. Secondly, it requires garbage collection. WebAssembly does not have a garbage collector. On those two points, there's a project called AssemblyScript, which is gaining quite a bit of traction, where they use TypeScript to provide the static typing to allow compilation. They also have a lightweight garbage collector in there. Parts of your JavaScript Code, you can potentially move over to WebAssembly.

Participant 2: Let's say that my company wants to create WebAssembly for a project, can I reference that, and say to the client, that wrote it.

Eberhardt: How do you actually transfer it to the client? There are a few different ways of doing it. There's the HTTP fetch APIs. The JavaScript host can fetch it directly. There's also a proposal in flight which integrates WebAssembly into the ECMAScript module system. The way you can now asynchronously download modules.

Participant 2: Particularly, I can put it as a text in my JavaScript and use it.

Eberhardt: Yes. In the near future, you'll just be able to import WebAssembly.

Participant 2: I believe the binary version is very small, but the text version is going to be really large.

Eberhardt: You always send it as WASM in the binary format, over HTTP. It will integrate seamlessly with your standard JavaScript tooling over time. At the moment, you can do that with various open source plugins. Having it moved to W3C means it's a lot closer to the web standards.

Participant 3: You mentioned that there wasn't any memory management built in WebAssembly, is that correct?

Eberhardt: Correct. It's a blank canvas, basically.

Participant 3: It is possible that some standards may appear in the future. Is there a way to reuse WebAssembly code like we have in C with a shared or statically linked libraries?

Eberhardt: Yes. The way people are tackling that at the moment is the Rust community have a project called wee_alloc, which is a very tiny memory allocator that works with WebAssembly. If you're using Blazor, for example, that's a project that takes C# and compiles it to WebAssembly. That requires a garbage collector. What they've done is they've compiled their garbage collector to run within WebAssembly itself. As well as downloading the code, you're downloading a runtime as well, which is quite an expensive way of doing it.

Participant 3: Within the same image or a different image?

Eberhardt: Within the same WebAssembly module. There is a proposal in flight at the moment for adding garbage collection to WebAssembly. WebAssembly will never have its own garbage collector. What they're doing is they're putting in place the mechanisms required for it to work alongside the host garbage collector. Your browser's JavaScript garbage collector will be able to manage the WebAssembly linear memory, which is, I think, the best of both worlds.

Participant 4: In terms of the GC, if that's delegated to the host, is there actually an advantage in terms of the GC not being part of WebAssembly.

Eberhardt: I think it makes sense for WebAssembly to be agnostic of the way memory is managed, because languages are different. Some are garbage collected. Some have automatic reference counting. Rust has its own concept of ownership. I don't think it makes sense for WebAssembly to have built-in memory management capabilities. I think they're going in the right direction by doing that. Also, at runtime, why would you want two garbage collectors when ideally you can have one garbage collector managing your memory? It feels like the right way, to me.

Participant 5: Do you think that at some point in the future WebAssembly will replace JavaScript as the language that is used on the web?

Eberhardt: I don't think anyone involved in WebAssembly is considering it as a replacement for JavaScript on the web. It's something that will always work alongside JavaScript on the web. Where it will replace other things is outside of the web. As an example, if you've done any programming on blockchain, with blockchain, you create smart contracts. Ethereum are the best known blockchains for smart contract execution. On their roadmap, next year, they're going to be replacing their current virtual machine with a WebAssembly based virtual machine. The reason they're doing that is because their current virtual machine only runs Solidity, which is their own smart contract language. Whereas in the future, through WebAssembly, you'll be able to write smart contracts in JavaScript if you wish. On the web, it's not going to be a replacement. The vast majority of what we do on the web is simple form filling. JavaScript is really good at that. Whereas there are pockets of functionality, if you want to make a video editor, for example, you might have the core runtime in WebAssembly. All the stuff that works with the DOM, for example, is typically going to be in JavaScript. AutoCAD is a good example. They have a WebAssembly version of AutoCAD, which is a desktop design application. The core is WebAssembly. They still have a JavaScript or TypeScript, and React wrapper that sits around that. No, it's never going to replace JavaScript, but it will be a close partner with JavaScript.

Participant 6: Other than JavaScript, is there any other host which can host WebAssembly? The question is primarily coming from a reason that, if I want to build my own rule engine having some domain language given to it and I want to host it in some client application, is it possible to build something like that?

Eberhardt: Yes, even other WebAssembly hosts. The main WebAssembly host that I know of is obviously the web. I mentioned blockchain and Ethereum. Basically, any blockchain startup these days is using a WebAssembly virtual machine, things like NEAR Protocol, and Polkadot, and loads of others are all using WebAssembly. Another place where it's starting to gain traction is for serverless computing. If you use AWS Lambda and so on, the actual execution engine for that is quite heavyweight. The way that it achieves isolation is through quite a large stack of virtual machines, and so on. Whereas WebAssembly allows you to achieve isolation in a very lightweight fashion. CloudFlare and Fastly are offering up serverless functions that run on WebAssembly. Intel are looking at it for some IoT applications as well. I think we're going to see WebAssembly cropping up all over the place. To be honest, I think the people that named it are starting to regret it already because it's certainly not just tied to the web.

Participant 7: Building on that answer, does the host language have to be JavaScript?

Eberhardt: No. The host can be anything you like. There are C++ hosts, Rust hosts. It has some interface that allows you to interface it with a whole range of languages. There's a startup called Wasmer, actually, that have their own WebAssembly virtual machine, and they ship a bunch of bindings. Whether you're writing in Ruby, or Python, or Rust, or whatever, they have a set of bindings that allow you to embed a WebAssembly virtual machine alongside your native language as it were. It'll run pretty much anywhere.

See more presentations with transcripts

Recorded at:

Sep 03, 2020

Colin Eberhardt

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Build Your Own WebAssembly Compiler

Summary

Bio

About the conference

Transcript

Why We Need WebAssembly

What Is WebAssembly?

Why Create a WebAssembly Compiler?

Bucket List

A Simple WASM Module

WASM Module Execution

An 'add' Function

Encoding the 'add' Function

Building a Compiler

Chasm v0.1

The Tokenizer

The Parser

The Emitter

Demo

Recap

Chasm v0.2 - Expressions

Demo

Chasm v0.3 - Variables and While Loops

Variables

While Loops

Demo

Chasm v1.0 - Setpixel

Recap

Questions and Answers

Related Sponsored Content

This content is in the Web Development topic

Related Topics:

Related Editorial

Popular across InfoQ