BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Build Your Own WebAssembly Compiler

Build Your Own WebAssembly Compiler

Bookmarks
41:46

Summary

Colin Eberhardt looks at some of the internals of WebAssembly, explores how it works “under the hood”, and looks at how to create a (simple) compiler that targets this runtime.

Bio

Colin Eberhardt is the Technology Director at Scott Logic, a UK-based software consultancy where they create complex application for their financial services clients. He is an avid technology enthusiast, spending his evenings contributing to open source projects, writing blog posts and learning as much as he can.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Eberhardt: My name's Colin Eberhardt. I'm the technology director of a company called Scott Logic, which I doubt any of you have heard about. We're a fairly small UK-based software consultancy that specializes in writing some large-scale applications for financial services. I spend a lot of time writing fairly large-scale JavaScript applications.

I'm also a bit of a fan of WebAssembly as you might guess from the talk title. I run the WebAssembly Weekly newsletter. I've heard it's the most popular WebAssembly newsletter. It's actually the only one, but it's still the most popular. I published the 100th issue just yesterday, so I'm pretty chuffed with that. One of the reasons I'm quite interested in WebAssembly is, there's a lot going on in that community, there's a lot of innovation going on. In fact, just yesterday there was an announcement about a new group called the Bytecode Alliance, which is a new industry partnership between a number of different companies that are looking to make a secure by-default WebAssembly ecosystem. This is actually intended to tackle some of the problems that were talked about in the previous talk around vulnerabilities in untrusted third party NPM modules.

Why Do We Need WebAssembly

Why do we need WebAssemby? To me, this slide sums it up really quite nicely. To give you a bit of a context, JavaScript was invented about 25 years ago, and it was originally conceived as a relatively simple scripting language to add a little bit of interactivity into the web. These days we're not just writing a few hundred lines of code of JavaScript. Typically, we're writing thousands or tens of thousands of lines of code of JavaScript. The tooling we have is really quite advanced compared to that which we had 20 or so years ago. We've got things like TypeScript and Babel. We've got UI frameworks like React that are doing all these really quite clever things and our tooling is doing all kinds of complex compilation, transpilation, and transforms. Yet the thing that it emits is still JavaScript, this kind of obfuscated hard to read JavaScript. JavaScript is a compilation target. Some have called it the assembly language of the web. You might think, "Yes, that's fine." Well, it's not fine.

To understand why it's not fine, you have to look at how JavaScript is executed within the browser. The first thing that happens is your browser engine receives the characters over HTTP and then it has to parse them into what's known as an abstract syntax tree, and we'll get onto that in a little bit. From there, it's able to generate a bytecode, and at that point, it's able to initially run your application using an interpreter. The thing is, interpreters are pretty slow. JavaScript of 10, 15 years ago was very slow because it was just interpreted. These days the runtime will monitor the execution of your application, and it will make some assumptions. It will make some assumptions about, for example, the types being used, and from there it's able to, just in time, emit a compiled version of your code which runs faster.

Modern browsers have multiple levels of optimization, multiple tiers. Unfortunately, if some of the things they did to optimize your code prove to be invalid, if some of the assumptions are invalid, it has to bail out and go to a slightly lower performance tier. These days JavaScript runs really quite fast, but unfortunately, because of these steps you see in front of you, it takes a long time to get there. What this means is from a end-user perspective, there's an impact. There's an impact on how long it takes for your JavaScript application to get up and running. The first thing as I mentioned previously is your code is parsed, it's compiled, optimized, re-optimized, and eventually executed at a pretty decent speed, and then it's eventually garbage-collected. The way that JavaScript is delivered to the browser today has an impact on the end-users.

What is WebAssembly? On the WebAssembly website, they have this really nice one line of description. "WebAssembly is a new portable size and load time-efficient format suitable for compilation to the web." I'm going to just pick this part a little bit. It's portable; as you'd expect, it's a web technology. You expect it to work in Safari, in Chrome, in Edge, for example. It's size and load time-efficient format. It's not an ASCII format, it's a binary format. It's load time-efficient. It's designed to load and execute quickly. Finally, it's suitable for compilations to the web. JavaScript is not suitable for compilation to the web. It was never ever designed to do that. Whereas WebAssembly was designed with compilation in mind from day one. Also, it was designed to be a compilation target for a wide range of languages, so not just JavaScript - in fact, JavaScript's tricky to compile to WebAssembly - but it was designed for C++, C#, Rust, Java to bring all of those languages to the web as well.

Finally, to bring that home, the sort of timeline of execution of JavaScript is shown at the top. Contrast that with WebAssembly. All the browser has to do is decode, compile, optimize and execute. This happens much more rapidly when compared to JavaScript. That's why we need WebAssembly.

Why Create a WebAssembly Compiler?

Why create a WebAssembly compiler? Why am I standing here telling you about how to create your own WebAssembly compiler? There's a couple of reasons. First, is something I read at the beginning of the year. Stack Overflow, each year, publish a fascinating survey. I think they have around about 20,000 respondents where they ask people about the languages and the frameworks they use. They also ask them for their sentiments. What languages, tools, frameworks do they love, what do they dread - maybe not hate - what do they enjoy using. Interestingly, WebAssembly was the fifth most loved programming language. I thought, "That's nuts. How many people out there are actually programming in WebAssembly?"

WebAssembly was the only compilation target listed under the most loved languages. I thought, "Maybe people want to know a little bit more about WebAssembly, the language itself." That was one of the ideas that made me think about writing this talk. The next one is I've always had a bit of a programming bucket list. There are things that I've always wanted to do as a programmer, things like create an open-source project and meet Brendan Eich -which I did, he's a lovely guy. I came from a physics background, so I didn't study computer science, so I never got the opportunity to learn about compilers and so on. I've always wanted to do that. Those two things, coupled together made me think, "Ok. I'm going to create my own language. I'm going to write a compiler that compiles it to WebAssembly, and then I'm going to do a talk on it." It took me many weeks and months, and there's quite a lot to go through.

In order to constrain my experimentation, I had all kinds of crazy ideas. I thought, "I'm going to write a programming language that has enough structure to it to achieve a fairly modest goal. I want to create a programming language that allows me to render a simple fractal, a Mandelbrot set. This is an example of the language. It's not a very nice looking language, but you it'll do.

Let's take the first step. Let's look at creating the simplest, the most trivial wasm module with code. I'm using TypeScript here, so I'm going to construct the simplest wasm module possible. It stands out the simplest wasm module is just eight bytes long. It starts with a module header, and if you're any good with your ASCII codes, I'm sure you'll know, 61, 73 6d is ASM. The next is the module version number, which at the moment is version one. Concatenate these together, and you get the simplest possible WebAssembly module. Typically, you wouldn't do this. Typically, you wouldn't construct this in memory. Typically, this would be downloaded over HTTP as a binary wasm module, but you can do either.

In order to run this, what you have to do is take your binary and instantiate it using the WebAssembly APIs. There's a new set of APIs for JavaScripts that allow you to instantiate and interact with WebAssembly modules. This already illustrates some quite interesting for properties of WebAssembly. You don't download it into the browser directly from a script tag. The only way to get a WebAssembly module instantiated and interact with it is through the JavaScript host. At the moment, this WebAssembly module does absolutely nothing, but it's still a perfectly valid module. Let's make it a little bit more complicated. Let's try to create a module that does something vaguely useful - a simple add function.

Rather than looking at it from the binary format and looking at hex codes, it's a little easier to look at it in the WebAssembly text format. If you've ever done any assembly language programming, typically, when you're doing low-level programming, you'll use assembly language, which is a slightly more human-readable version of the machine code that it represents. WebAssembly has the same kind of two different views. We have the text format and the binary format. This is a very simple valid WebAssembly module that provides an add function. It's a function that has two parameters of type float, 32 bits. It returns a 32-bit result. The way that it works is it gets the two parameters using the get_local opcode, and then it adds the two together. Finally, this is exported to the host so that it can execute it.

This gives us a little bit more insight into what WebAssembly is actually like. It has a relatively simple instruction set. It has an assembly-like instruction set. If you've ever done any assembly language programming, it's got that feel to it. It only has four numeric types. It has two integer types and two floating-point types, and that's it. It's a stack machine, so you'll notice the add operation here, the two get_local set up the stack, they push two values onto the stack, the add instruction pops those two values, adds them together and pushes them to the stack. Finally, the function returns the remaining value on the stack.

One other interesting thing is the WebAssembly has absolutely no built-in I/O. It cannot access the DOM, it can't access a file system, it can't do a console log. This is quite important for WebAssembly. It means that the attack surface is nonexistent. The only way for WebAssembly to interact with the DOM or its host is through exported and imported functions. To encode that in binary format, the binaries are arranged into sections. You've seen the header and the version number. Following that, you have the type section and the import section and the function section. These are packed together in sequence. I'm not going to go into too much detail, you don't need to know the ins and outs of all of these. The main reason it's split into these various different sections is to minimize the size of the assembly module. If you have two functions with the same signature, it makes sense to have those all encoded in the type section and reference them later. That's the main difference between the text format and the binary format.

Let's construct an add function again in code, in this case, using TypeScript. The code is really quite simple. I have an enumeration of my available opcodes. I'm just taking the get_local opcode. Following that, I'm encoding the zero index. This uses an unsigned LEB encoding. All you need to know is that that's a very standard encoding, which is a variable length encoding. It's between one and four bytes long. Next, I'm encoding my get_local opcode and finally the add opcode, and that's it, that's my WebAssembly code. The next thing I need to do is package that up into a function body, and again, this is using some very simple encoding, all the encodeVector function does is sort of prefix my vector with the length and that's it.

Finally, I'm constructing my code section by encoding my vector functions together. This is pretty much it. It doesn't matter if you don't understand every single line here. All you need to really understand is that it's relatively simple to handcraft these WebAssembly modules. Finally, I'm able to instantiate the wasm module and invoke the exported function. If I look at the output in binary formats, again, I can see the actual code at the end there. If you recall, the get_local opcode had a hex code of 20, for example. All really simple.

Building a Compiler

Let's start looking at how we can turn this simple example into a compiler. Before delving into the detail, I just want to get a little bit of terminology out of the way if you're not quite familiar with it. My language is comprised of statements. At the top level, I just have a collection of statements. Here's one example, a variable declaration statement. As you can imagine, this declares the variable b and assigns it the value zero. Here's the variable assignment statement, it assigns an existing variable to a new value. Interestingly, statements can have other statements nested within them, as is the case with the while statements. Another important component of the language is a concept called expressions. Expressions return values, whereas statements are void, they do not return values. Finally, here, we see an expression tree. Expressions can be composed using brackets and operations. These are the basic building blocks of my language.

Then we'll look at the basic building blocks of the compiler itself. You might have heard of these terms before. I'd certainly heard of these terms, but I haven't had the chance to explore them personally. The first thing that happens is my code is processed by my tokeniser into an array of tokens. It's then parsed into an abstract syntax tree and then finally it emits my wasm binary. We're going to visit each one of these in turn.

For the first version, the 0.1 version of my language, which is called chasm, I asked people on Twitter for some good programming language names, 99% of them were terrible. Some of them weren't even repeatable. This is version 0.1 of my programming language. My first iteration was a programming language, which did nothing more than then allow me to print numbers, and that was it.

Let's look at the tokenizer for this language. Rather than looking at the code of the tokenizer, I thought it's easier to actually look at what this code does. It's only about 15 lines of code, and it's comprised of a number of regular expressions which match the various building blocks of my language. The top regular expression we have here matches a one or more digits or decimal points, bonus points for nice thing that it's not a terribly robust regular expression. I can have multiple decimal points, but we'll gloss over that. My next regular expression matches a couple of my keywords, "print," and "var," and the final one matches white space.

The tokenizer advances through the input, which is my program written in chasm and matches these tokens one after another. The first location here, the whitespace pattern matches and that does nothing. As it advances to the next location, my keywords, regular expression matches, and this causes it to push a token to the output. It then advances to the next whitespace again, which is ignored. Finally, it matches the number token, which is pushed to the outputs. Again, it's retaining the index at which it matches for future debug support, which I haven't implemented yet.

The output here is just a couple of tokens. What we see here is it's removed some of the whitespace. In my language, whitespace is not semantic. It has no meaning, so it can be disposed of. The tokenizer also provides some basic validation at the syntax. The ability to tokenize a texturing doesn't necessarily mean it's executable, but it will throw up, for example, if it comes across a keyword which isn't valid in your language.

The next step is a little bit more complicated. This is our parser. The parser takes the tokens is the input. There's a little bit more code going on here. I'm going to draw your ride certain parts of it, so don't worry if you don't understand all of it. Just like the tokenizer, this advances step-by-step. We have pointed to the current token, and we also have a function that eats the current token and advances to the next token in the inputs. We've got some code, which I'll elaborate on shortly. This is the main body of my parser. As I mentioned previously, my language is comprised of a collection of statements, so my parser is set up just like that. It expects that for each token, it will be the start of the next statement in the language. If for whatever reason the tokens do not conform to that, an exception will occur, and it's caught elsewhere.

Let's look at the statement parser. At the moment, my language does nothing more than print numbers. The only token type I'm expecting here is a keyword and in this case, the keyword value is print. In future, there'll be more of them. It eats the token, and the next thing it does is it advances to the expression. Each print statements is followed by an expression. Here is the expression parser. Again, the language does nothing more than print simple numeric values. The expression parser is very simple. It matches the type, which is always number, and it converts this number string into a real numeric type, and that's it and eats the token. The output of my parser is the abstract syntax tree. Here you can see these two tokens are converted into an abstract syntax tree, which is a single print statement and a single number literal expression. That's the transformation taking place here.

The final step is the emitter. Again, a little bit of code going on here. The emitter iterates over each of the statements and here it matches, or it switches on the statement type. Here, the only statement type is always print at the moment. The first thing it does is it emits the expression that relates to the print statements. This is because WebAssembly is a stack machine. The print operation expects to have the value already present on the stack, so we emit the expression first. Here, the only expression type again is a numeric literal at the moment. All we do is we take the node value, and we emit to the f32_const opcode. That's a constant value opcode using the IEEE 754 encoding, which I'm sure you all know. Finally, the print statement itself is implemented as a call. I'll get onto that in a little bit, we'll skip that for the time being.

It's time for a demo. If the demo gods are on my side, I should be able to show chasm working. Someone name a number.

Participant 1: Forty-two.

Eberhardt:. If I run my compiler, the output is 42. Just to recap, what's happening here is my tokenizer, my parser, and emitter are all written in TypeScript and compiled to JavaScript - because we compile JavaScript. Within the browser, it's translating that simple application into a WebAssembly module and executing it. I should be able to print something else. If I run that, I can print multiple statements. This is what it looks like. I'm a bit of a magician because I knew you were going say 42 and that was not a setup. I've done this a couple of times, and the first time someone said 42 and I thought, "I bet everyone says 42," and they do.

As you can see, here's the tokenized output, the abstract syntax tree, and the final WebAssembly module. You can see it's all together. As I promised, I said I'd return to the print statement. As I mentioned, WebAssembly has no built-in IO. In order to perform my print statement, I want to do effectively a console log, so I have to work with the JavaScript host in order to achieve that. WebAssembly modules can import and export functions. By importing a function, it's able to execute a JavaScript function, and by exporting it, it allows the JavaScript host to execute one of the WebAssembly functions. That's how WebAssembly performs IO. For example, if he wants to do something a little bit more meaningful, like interact with the DOM, you have to do it through function imports and exports.

The next version of my chasm language, I wanted to implement more complex expressions. I wanted to create sort of expression trees to allow me to do some fairly simple maths. I'm not going to delve into each of the steps in quite so much data, and I'm going to accelerate a little bit here. My tokenizer, to support this, I only had to add another couple of regular expressions, and that's about it. I had to add a regular expression to match brackets, and that only took me five minutes because I had to work out the escaping and all that lot. Then I have another regular expression which matches the various operators I support, and that's it. My tokenizer is good to go.

Looking at the parser side of things, the only thing I had to update was my expression parser. This is a little bit more interesting. Here's what happens if it encounters parentheses in the array of tokens. What it does is, in the array of tokens, you expect to see the left-hand operand, the right-hand operand and the operator in the middle, which basically the parser is expecting them in that order. They allow nesting, so the left or the left and the right-hand side use recursion to recursively call the expression parser once again. With a few additional lines of code, my expression parser is now able to construct an abstract syntax tree, which is truly tree-like. Here this print (42+10)/2) is encoded as that abstract syntax tree.

Moving on to the emitter. Again, there's a few extra additional things going on here, which I'm going to point out. My expression emitter now uses a visitor pattern. I'm sure you'll have probably heard of a visitor pattern before. It's fairly classic software engineering pattern. In this case, I'm using a tree visitor. My abstract syntax tree is a tree, and my traverse function visits every node on that tree executing a function, that's the visitor. This is a depth-first post-order traversal. What that means is that it visits the left-hand node, then the right-hand node, then the root. The reason it does that, again, is that WebAssembly is a stack machine that sets up the operations in the correct order. Then the binary expression when it encounters that, all it has to do is convert the operation into the right opcode and that's pretty much it.

Demo time once again. If I'm lucky, if I run that, it does some basic maths for me. One thing that I found interesting here is it took me quite a while to set up my original compiler architecture, the parser, tokenizer, emitter, and that sort of thing. Once I started to add extra features to my language, it became really quite easy. Just a small concept of having expression trees which can be executed. That was two or three lines of extra code and the tokenizer, that was maybe 10 or so extra lines of code in the parser, and maybe another 10 in the emitter. All of a sudden, my language is a lot more powerful. I'm going to again accelerate a little further. I'm not going to go into all of the details. I'm just going to touch on a few different things.

The next version of chasm - I wanted to add variables and while loops. We'll look at how variables map between my language and WebAssembly. WebAssembly is composed of multiple functions, and functions have parameters and a return value as with most languages. They also have the concept of locals, so each function has a zero, one or more local variables. On the left-hand side here, you'll see my simple chasm application. It takes a variable, assigns a value 23 and then I'm printing the value of that variable. On the right-hand side, this is roughly speaking how you do the same with WebAssembly. We define a function that has a single local, which is my variable f. We set up a constant and store it within that local, so set_local 0. Then we retrieve it using get_local 0 and then call my print function. I know an optimizing compiler would trash a few of those are operations, but I hope you get the point.

Mapping variables from my language to WebAssembly, it's really quite easy. All I have to do is maintain a symbol table which maps the variable name to the index within the function, and that's it. It was really quite simple. While loops - again, surprisingly simple. An interesting thing about WebAssembly is even though it is an assembly-like language, it has some surprisingly high-level concepts intermixed with that. You've already seen that it has functions, which is quite surprising for something that claims to be an assembly language. It also has loop constructs, it has ifs and else. For example, when I wanted to implement wild loops, I was able to use blocks and loops within WebAssembly. The way this works is the loop condition is encoded, and then the next thing it does is it uses the eqz's opcode that determines whether the current value on the stack is equal to zero. The next one is break if or branch if to a stack depth of one. What that means is if the stack value is equal to zero, it breaks to an execution stack depth of one. What this means is it breaks out of both the loop and the block. If that's not the case, it will execute the nested statements and then break to a stack depth of zero which repeats the loop.

Let's give that a quick demo. This is going to be a little bit more complicated. Let's start with, "var = 0, while (f< 10), f = (f+1), print f, endwhile." That all works, super chuffed. As with the previous upgrade to the chasm language, it wasn't that hard to add these relatively high-level concepts.

Finally, chasm version 1.0, time for a major release - the setpixel function. Rendering a Mandelbrot is really quite simple. You don't need that many different language constructs to do the basic maths. The final piece of the puzzle I needed was a setpixel. This is interesting because as I mentioned a few times, WebAssembly has no built-in I/O. How do you write the canvas in order to render to the screen with the WebAssembly? You could use function imports and exports. I could have a JavaScript function which is called setpixel and import that into my WebAssembly module, but that would be relatively inefficient. It would be quite chatty over the WebAssembly-JavaScript boundary. There's actually a slightly smarter way of doing it.

Previously I mentioned the only way to do I/O with WebAssembly is through function imports and exports. There's actually an additional way of performing I/O as well. WebAssembly modules can optionally have a block of linear memory, and they are able to read and write to it using store and load operations. Interestingly, this linear memory can be shared with the hosting environments. In the case of a JavaScript host, this is an array buffer. Both your WebAssembly application and your JavaScript application can read and write to the same block of memory. What I did was basically set the memory up as video RAM effectively. This is my kind of virtual canvas.

This is the final demo, and I'm not going to type that all out in front of you because I'd never get it right, but that renders the Mandelbrot set. My chasm language is complete. I must admit I worked on this quite a lot in my evenings.

Recap

Finally, to recap, WebAssembly is a relatively simple virtual machine. It has something like 60 opcodes. It's got a really quite simple runtime model. For me, I find that quite fascinating. I'm used to using web technologies that I don't understand, and by that I mean I don't understand them under the hood. I'd like to think I understand how to use them, but things like React, for example, I haven't got the foggiest how it works under the hood. Whereas with WebAssembly, it's quite enjoyable to find a new concept on the web that you can literally understand everything about it. As a result, I find it quite a fun playground for doing some of the things that I used to do back in the kind of eight-bit computing era. I spend a fair bit of time writing WebAssembly by hand, not because I'm crazy, it's fun.

As a bit of an aside, I don't use TypeScript nearly as much as I should. This project was a really nice reminder for myself about how powerful TypeScript is. For example, the structure of my abstract syntax tree is defined as TypeScript interfaces. I get type checking support, which really ensures that my parser is quite robust.

I also found that creating a simple compiler isn't as hard as I initially thought. It's also a good way of exercising your programming skills. There are quite a few concepts in there - things like visitor patterns, tokenizers, so all kinds of interesting software engineering concepts that you come across through having a goal with writing a compiler.

Also, WebAssembly is a very new technology, and there's a lot of creative energy being poured into WebAssembly. If you take the time to understand it, there are quite a number of really interesting open-source projects that you can potentially contribute to once you've got that knowledge.

Hopefully, you have been inspired by this talk to find out a little bit more. Returning to my bucket list, I've ticked the final one-off I guess - or maybe not. This is one of those fun projects that spiralled out of control. Once I'd got that far, I thought, "I could spend another few days doing strings, or arrays, or functions." The interesting thing is when you get to things like strings and arrays, you get to the really hard stuff. You get to, for example, memory allocation. WebAssembly doesn't have a garbage collector, so you need to manage memory yourself. You need to work out how to store these concepts within linear memory. It's a lot of fun.

That's how to build your own WebAssembly compiler. All the codes are on GitHub if you want to play around with it. Also, it's arranged in such a way that each of the fictitious releases of chasm is a commit. You can roll back right to the beginning, which is that simple, few lines of code that makes the first eight bytes and go commit by commit through step-by-step if you're interested in playing along at home.

Questions and Answers

Participant 2: There's a lot of great languages that we want to run on the web through WebAssembly. Is there any reason that tokenizing or the parsing would have to be changed for WebAssembly or is it just writing a new emitter for an existing compiler?

Eberhardt: You're talking about real languages now, aren't you? There's a few different ways of doing it. The first language to compile to WebAssembly was C and C++ using the M scripting compiler. Under the hood, that uses the LLVM toolchain, which is a modular infrastructure for building compilers. To your point about whether you have to reinvent the wheel is I guess what you're talking about, compiler technology is already relatively modular. In order to create the first WebAssembly compiler, the C and C++ team were able to build on some preexisting LLVM concepts. It depends on the language, though. For example, a number of the early languages used the M scripts and then LLVM, some of them have used different compiler technologies. We're seeing quite a lot of divergence now in the technologies used.

Participant 3: Besides just for fun, have you been able to apply this to anything, like in your work?

Eberhardt: Yes. Not much, admittedly. It's all pretty new technology. In practice actually, there are a few people using WebAssembly in production, and the ones that are most well-known are AutoCAD, which have a huge C++ codebase. There's PSPDFKit who've taken a half-million line of code PDF rendering engine and moved that to the web. In my own line of work, I do work within financial services, and we work with a company that had their own Bespoke Protocol for streaming pricing data. It's relatively old, and it was all written in C++, and they keep getting annoying people say, "But I want to do it in Node," and they're, "But it's C++." We are able to help them through M script and compile their client library for decoding their Bespoke Protocol, rapid TypeScripts layer on top of it to make it play nice. Yes, we have used it in production, not in many cases, but the technology is only two years old.

Participant 4: I haven't looked into WebAssembly enough to really figure this out, but one thing that I've always been curious about is, for those more complex run times, how do they interact with memory? I saw in the assembly that you had that it's certainly putting things in the stack on those implicit registers, I suppose, but how does it work when it's like "It's going to be C++?"

Eberhardt: It's entirely down to the design of the language itself. You've basically got linear memory; you've got a big block of memory. The moment Rust ships with its own very lightweight allocator whereas C# through Blazor, they actually ship a garbage collector, which is compiled to WebAssembly. It's a bit of a blank canvas. It's got that kind of low-level feel. It's up to the language designers and the implementers how they use that memory to best suit their language. Participant 4: What I'm unclear about is, is there a particular opcode for saying, "I want this much memory and then [inaudible 00:36:30] space," or something like that?

Eberhardt: You ask for a certain number of pages of memory, and you can also grow and shrink memory dynamically, but that's how it works. You allocate a block of memory upfront.

Participant 5: When you are working with Rust, or C#, or C codes, can you do any I/O or do you need to have JavaScript in the two of them?

Eberhardt: WebAssembly has no built-in I/O at the moment. However, there's a working group called WASI, which stands for WebAssembly System Interface, which are defining a core set of I/O operations. Although they're not really designed for the browser, they're designed for outer browser WebAssembly. The WebAssembly runtime is being used for things like serverless functions, for writing smart contracts on the blockchain. There is a real need for a standard set of system interfaces there. In the browser that's making use of automatic generation of bindings. There's a certain amount of glue code required on each side to kind of bridge the sort of boundary. At the moment you can generate lots of that. There's a thing called wasm-bindgen for Rust, which does exactly that. There's additional work, but most of that is hidden by tooling.

Participant 6: One of the dreams, of course, is that you have your backend language used on the front end, like Java or Python. Is that anywhere on the horizon or is that far way?

Eberhardt: That's a good question. Taking the example of PSPDFKit, the company that took their PDF rendering engine, they originally had a web version of that product, and that was all running on the server. Through WebAssembly, they were able to shift the same code into the browser and offer that as a commercial product. One of the main reasons that JavaScript is so popular is not how good a language it is - it's an awesome language - the reason it's popular is because of the ubiquity of the web platform. It's the biggest platform out there. I think WebAssembly is a great idea and that it allows other languages to be part of the most ubiquitous platform and runtime there is.

Participant 7: I believe there's Doom 3 running in WebAssembly in the browser?

Eberhardt: The Unreal Engine was one of the early demos from asm.js, which was a precursor to WebAssembly.

Participant 8: One way to convince other people to pay attention to WebAssembly would be if we had some formal way to quantify the speed benefits we get if we report our application to WebAssembly. Do you have some good examples?

Eberhardt: I've got some bad examples. Bad examples are the ones that are the most revealing. People do ask time and time again, "What's the performance of WebAssembly like?" My response to that is, "What's the performance of JavaScript like?" It's actually pretty damn good. If you look at most algorithmic benchmarks, Javascript is maybe 30% slower than native code. You ask, "What's the performance of WebAssembly like?" It's only got that 30% gap to span. It can't actually get that much faster, which is why at the beginning of the talk, I was focusing on the time it takes for your JavaScript application to reach peak performance. That's what WebAssembly is improving. It's significantly reducing the amount of time to reach peak performance. It's not adding much to peak performance because JavaScript is pretty fast as it is. It does provide better performance, but you've got to ask the right question.

Participant 9: Just a piggybacking on what you were just saying. Would you say that perhaps the biggest value proposition for WebAssembly than over JavaScript would be the ability to use other languages at near-native speed?

Eberhardt: Yes. I think that has to be one of the biggest propositions of WebAssembly. Yes, I'd say so. JavaScript is a highly capable language. There are times when it doesn't give you the performance that you need, but there are very few people here who probably have a real performance issue with JavaScript. Typically, your performance issue is elsewhere. It's in your use of the Dom APIs or something else. JavaScript is quite fast. Yes, the value proposition really is bringing other languages to the web, but also the value proposition is the WebAssembly runtime now being used on the edge, within clouds, and on the blockchain. It's bringing that kind of universal runtime to a whole host of other areas as well.

 

See more presentations with transcripts

 

Recorded at:

Dec 23, 2019

BT