BT

Debate: Do We Need a Universal Web Bytecode?

by Abel Avram on May 22, 2013 |

Is a universal web bytecode worth the trouble creating it? Is LLVM the solution? Which is better at running native code in the browser: Mozilla asm.js or Google PNaCl? This article contains opinions expressed on the web on these issues.

A comment by Raniz on an ArsTechnica post regarding video codecs written in JavaScript sparked a series of reactions in the comments section and on the web. Raniz suggested a “standardized bytecode for browsers [that] would (most likely) allow developers a broader range of languages to choose from”, letting developers the option to choose the language they like for web programming without having to use JavaScript. The bytecode would be, like the JVM or CLR bytecode, a common platform for web development. The idea sounds interesting at first glance, and some even suggested using LLVM’s bitcode as the intermediary “bytecode.” There are already LLVM compilers for many languages including ActionScript, Ada, D, Fortran, Haskell, Java bytecode, Objective-C, Python, Ruby, Rust, Scala and C#.

The main problem with LLVM bitcode is that it is target dependent, i.e. the bitcode generated for different architectures is different, unlike Java which has identical bytecode for different targets, the JVM taking care of generating the native code for the machine it runs on. And there are a series of other problems with a universal web bytecode, some of them plaguing LLVM bitcode too(more details here), problems noted by msclrhd in his comment, from which we extract some excerpts:

The problem with standardizing on a bytecode is that you are restricting how the browser optimizes the JavaScript code…

You also have the problem of what bytecode to standardize on -- each JavaScript engine will have a different set of bytecodes with different semantics. All engines will need to agree on the bytecode to use.

There are also other considerations as the string representation differs between engines (V8/Chrome has an ASCII string variant; Mozilla keeps them all in UTF-16) and type representation (e.g. Firefox has "fatvals" that are 64-bit value types with 32-bits for the type and 32-bits for the value; 64-bit doubles take advantage of the representation of NaN values…

If the bytecode is binary, you have endian issues, floating point representation issues, etc.

Alon Zakai, a researcher for Mozilla working on Emscripten and asm.js, wrote an entire blog post on universal web bytecode, outlining some of the difficulties to be encountered in pursuing such a goal:

Some people want one bytecode, others want another, for various reasons. Some people just like the languages on one VM more than another. Some bytecode VMs are proprietary or patented or tightly controlled by a single corporation, and some people don't like some of those things. So we don't actually have a candidate for a single universal bytecode for the web. What we have is a hope for an ideal bytecode - and multiple potential candidates.

Zakai also made a list of requirements such a bytecode should meet:

  • Support all the languages
  • Run code at high speed
  • Be a convenient compiler target
  • Have a compact format for transfer
  • Be standardized
  • Be platform-independent
  • Be secure

While Zakai does not give much chance to a new bytecode to meet the requirements, he does see JavaScript as the right candidate: “arguably JavaScript is already very close to providing what a bytecode VM is supposed to offer, as listed in the 7 requirements above,” also mentioning what’s still missing in JavaScript:

At this point the main missing pieces are, first (as already mentioned) improving language support for ones not yet fully mature, and second, a few platform limitations that affect performance, notably lack of SIMD and threads with shared state.

Can JavaScript fill the gaps of SIMD and mutable-memory threads? Time will tell, and I think these things would take significant effort, but I believe it is clear that to standardize them would be orders of magnitude simpler and more realistic than to standardize a completely new bytecode. So a bytecode has no advantage there.

After outlining more difficulties in creating a universal VM – type conflicts between languages, garbage collection issues – Zakai concludes:

So I don't think there is much to gain, technically speaking, from considering a new bytecode for the web. The only clear advantage such an approach could give is perhaps a more elegant solution, if we started from scratch and designed a new solution with less baggage. That's an appealing idea, and in general elegance often leads to better results, but as argued earlier there would likely be no significant technical advantages to elegance in this particular case - so it would be elegance for elegance's sake.

While it seems that a universal bytecode does not stand much chance to succeed, there are still at least two major attempts at bringing other languages to the web. Both have started with C/C++ but efforts can be relatively easily extended to other languages, and, interestingly enough, both use LLVM:

  • Mozilla: C/C++ –> LLVM bitcode –> Emscripten –> asm.js –> Browser
  • Google: C/C++ –> LLVM bitcode –> PNaCl –> Browser

asm.js is an attempt at standardizing a subset of JavaScript that would run in any browser, containing constructs that can be better optimized for speed by a JavaScript engine. Emscripten is another project that generates asm.js from LLVM bitcode. According to Zakai, C++ code runs in Firefox via asm.js at 50% the speed of native code, and they expect the performance to improve over time.

PNaCl, recently announced by Google and covered in detail by InfoQ, runs C/C++ code in the browser in a sandbox at 80-90% of the native code speed with room to improve, according to David Sehr. While the performance is significantly better than Mozilla’s, it comes at a price: PNaCl has been in development for more than 2 years. It’s pretty hard to deal with endian issues, different pointer sizes, different floating point representations, etc. on multiple architectures. It would be simpler to enhance Chrome to include asm.js optimizations. But, on the other hand, asm.js may be too slow, as yab**uz commented:

And I will never use asm.js. Simply because it's too slow on non asm.js supported browsers. Epic Citadel at 20 fps on the latest Core i7-3770K is a joke. Slower than Flash Player!

JavaScript, a language created by Brendan Eich in 10 days in 1995, was meant to be a client scripting language that would infuse some dynamism to the static web pages of that time. Perhaps nobody foreseen the role this little language would play almost two decades later in spite of all the criticism and flaws it carried with it. JavaScript is heavily used today on the client side in all major browsers and it is making inroads on the server side especially because of Node.js’ popularity. And that’s not because JavaScript is such a brilliant language, but because it’s so hard to bring major players together to work on a better solution and to switch all the gears of the software industry. Like HTTP and HTML, JavaScript is going to thrive in spite of its shortcomings and the fact that we all know that we could do better, if we just agreed on it.

Now that we are stuck with JavaScript, will we have at least a universal web bytecode? Do we need one? Will attempts to run code written in others languages in the browser, such as Mozilla’s asm.js or Google’s PNaCl get traction? Which is better: asm.js or PNaCl? Have your say in the comments.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

This would be a good question to do a poll on by Faisal Waris

Given that Mozilla and Google have competing versions (there is also ActiveX) there may exist a strong undercurrent in favor of creating a new web bytecode standard.

A new byte code would be awsome. by Adib Saikali

Personally I would love to see a brand new byte code that is built on the best ideas of the JVM, the CLR, and other established byte code engines. Something that defines a memory model and threads, and something that is toolable and makes it easy for tool developers to build great development tools and frameworks.

I find it hard to believe that such a byte code + Run-time Memory Model + Thread Model would not be significantly better that JavaScript and hopefully free of its political baggage, and design flaws.

This subject has been pretty well covered, I think. It's not going to work. by Russell Leggett

Some previous coverage of this debate stretching pretty far back:
Hacker news from a few years ago:
news.ycombinator.com/item?id=1893686

Response from Brendan Eich:
www.aminutewithbrendan.com/pages/20101122

There have been other debates but that's pretty definitive for me. It really comes down to 2 major things in the source vs bytecode debate.
1. Bytecode has to be just as safe as source and so far the JVM has shown how hard that can be. Bytecode verification is not as simple validating a high level language.
2. Fast virtual machines have to be custom built for their languages. You won't be able to make a common bytecode work equally well for different languages. CLR/JVM have clearly shown that while they can do static code that fits a certain pattern well, they struggle to optimize dynamic code.
3. Standardizing on anything else would be a major challenge.
4. What about interop? Obviously it can be done, but how? What about trying to support concurrency models that are taken for granted. Would ruby have to be all async too? What about Java? Is any browser vendor willing to take a hit for their engine by forcing it to accept a standardized bytecode?

If a common bytecode were the right way to go, I think Google would have championed it when they went the Dart route. For that matter, they would have at least tried to prove it by sharing a vm for Dart and JavaScript.

The road to multiple languages is by adding more to JavaScript which make it a good compiler target. asm.js, TypedArrays, and soon enough things like binary data will add more of the low level stuff which will help other languages target JavaScript.

Re: This subject has been pretty well covered, I think. It's not going to w by Cameron Purdy

1. Bytecode has to be just as safe as source and so far the JVM has shown how hard that can be. Bytecode verification is not as simple validating a high level language.

This is not correct. Bytecode is much more rigid than "high level language", so validating it is far simpler. (I have built both "high level language" to byte code compilers, and byte code verifiers.)

Regarding the "JVM [showing] how hard that can be", I'm not sure what you're referring to; perhaps you mean security in Applets with respect to the SecurityManager? That issue had nothing to do with byte code validation.

2. Fast virtual machines have to be custom built for their languages. You won't be able to make a common bytecode work equally well for different languages. CLR/JVM have clearly shown that while they can do static code that fits a certain pattern well, they struggle to optimize dynamic code.

In practice, this has proved to be generally true. Byte codes that mirror a language design are easy to compile to, and byte codes that mirror a processor design are easy to optimize native code for, but in both cases, it is relatively difficult to compile arbitrary languages in a compact manner to a byte code that was not designed for that language.

3. Standardizing on anything else would be a major challenge.

True. It seems difficult to get tech companies to work cooperatively together on anything.

4. What about interop? Obviously it can be done, but how? What about trying to support concurrency models that are taken for granted. Would ruby have to be all async too? What about Java? Is any browser vendor willing to take a hit for their engine by forcing it to accept a standardized bytecode?

Yes, and what about JavaScript, which is inherently single-threaded? (There is no concurrency.)

At any rate, I'm not sure that I have any better answer to the original question, but it certainly is an interesting technical area.

Peace,

Cameron Purdy | Oracle
For sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

Re: This subject has been pretty well covered, I think. It's not going to w by Russell Leggett

1. Bytecode has to be just as safe as source and so far the JVM has shown how hard that can be. Bytecode verification is not as simple validating a high level language.

This is not correct. Bytecode is much more rigid than "high level language", so validating it is far simpler. (I have built both "high level language" to byte code compilers, and byte code verifiers.)

Regarding the "JVM [showing] how hard that can be", I'm not sure what you're referring to; perhaps you mean security in Applets with respect to the SecurityManager? That issue had nothing to do with byte code validation.


Yes, I should have been more thorough before I wrote this part, I was working from memory here (shame on me). I munged together a couple of things. One was a quote from Douglas Crockford (which is unsubstantiated and clearly debatable): "JavaScript's parser does a more efficient job of providing code security than the JVM's bytecode verifier."

The other thing I remember seeing was this article demonstrating a denial of service attack on the bytecode verifier. It looks like its been fixed so it is clearly not an insurmountable problem, so perhaps we'll call it even on this one.

4. What about interop? Obviously it can be done, but how? What about trying to support concurrency models that are taken for granted. Would ruby have to be all async too? What about Java? Is any browser vendor willing to take a hit for their engine by forcing it to accept a standardized bytecode?

Yes, and what about JavaScript, which is inherently single-threaded? (There is no concurrency.)


shrug People are working on it. River Trail for data parallelism, technically there are web workers, and I believe there is some work being done on Transferrable Objects to prevent code duplication (and I've even heard rumblings of something like linear types). My larger point was simply that JavaScript is not going away, even if some bytecode came along, JavaScript would still have to work, and that would mean conforming to the event loop model.

Universal web bytecode by Serge Bureau

I think we definitely need it.
The JVM (maybe improved) would be a good choice.
I could not care less about Javascript, it basically cost us then years of possible advancement.
We need many languages available on the web.

Like Scala, Clojure, ...
We need to bring modern languages to the web, the actual situation is pathetic.
The browsers stays archaic because of this.

Let's bring the web to our century ?

Re: Universal web bytecode by Russell Leggett

The JVM had its chance and clearly failed on the web... miserably.

Re: Universal web bytecode by Serge Bureau

In case you didńt notice, that was 15 years ago
Since then the web is more a mess than ever, and the best tool for multi languages is the jvm
Not the ridiculous javascript

Re: Universal web bytecode by Russell Leggett

In case you didńt notice, that was 15 years ago
Since then the web is more a mess than ever, and the best tool for multi languages is the jvm
Not the ridiculous javascript


The Java plugin is still around (sort of) and Sun/Oracle have even tried (with no success) to revive client side Java. Remember JavaFX? Remember Java 6 u 10?


Next Generation Java Plug-in
This release introduces a new (default) implementation of the Java Plug-in that provides support for applets in the web browser. The next generation Java Plug-in combines the best architectural features of applet and Java Web Start technologies. It provides a robust platform for deployment of Java and JavaFX content in the web browser.

The next generation Java Plug-in offers many powerful features for both advanced consumer content and enterprise applications. Some of these are:

- ability to increase the heap size and specify command-line arguments on a per-applet basis
- ability to select a particular version of the Java Runtime Environment for an individual applet
improved reliability
- better and more portable integration between the Java and JavaScript programming languages
- improved support for accessing the DOM of the containing web page
- enhanced support for web services


The Java plugin is still terrible.

Now don't get me wrong, I don't hate Java or the JVM. I use them every day. I also program a lot of JavaScript. You might hate JavaScript as a language, but it has done a much much better job of being useful for writing apps in browsers. Does that completely damn the idea of bytecodes if we were to start from scratch? I'm not sure, but it *is* a completely different debate. Your argument lacks substance. How do we go from the web we have to a non-crappy version of the JVM for browsers? How do we handle backwards compatibility with current websites? How do we handle slow startup times? How do we handle standardization? How do we handle versioning?

- Russ

Re: Universal web bytecode by Serge Bureau

My original message stated an improved JVM plugin.
The security scheme was wrong from day one, plus the fact that Oracle did not care about it before there was a big fuss about security issues lately.

But the JVM supports a whole list of languages and has a very large number of libraries, you certainly do not want to throw that away.

And with the like of Scala, Clojure, ... No alternatives are coming close.

So yes it needs some improvements, but starting over would be doomed.

Re: Universal web bytecode by Russell Leggett

Starting over would be doomed. I agree. Which is why replacing JavaScript with bytecode would be doomed. The JVM would never happen if for no other reason than political. You obviously couldn't just drop the JVM into all of these browsers, its owned by oracle and the open source licensing is too restrictive. Oracle has also proven to be litigious if they don't like what you're doing. No browser manufacturer would touch it with a thousand foot pole.

The only way a bytecode alternative could feasibly happen is if someone actually built a working example and pursued it and proved its value, the way Google is trying to do with Dart. And given that Google went with making Dart from scratch with its own language vm (not bytecode vm), I suspect that smarter people than me have given this some serious consideration. And wouldn't you know, they actually wrote an article explaining why they did not use a bytecode vm. And remember - this is coming from Lars Bak, who was technical lead of the HotSpot team at Sun. Not to mention that Google *already wrote a VM for JVM bytecodes for Android*. Its not like they're opposed to JVM bytecodes, but they still went with a language VM for Dart.

There are a few things to be upset with about the state of JavaScript and the browser
- The language itself (syntax, type system, primitives).
- The lack of capability (apis offered by the browser).
- Lack of libraries.
- Performance.
- Its not your favorite language.

Those are all being worked on. ES6 is getting a pretty serious overhaul to fill gaps and smooth out the rough spots. Overall, I actually think they're doing a pretty good job while not making it a whole new language like ES4 was basically doing. Browser are greatly improving capabilities in terms of APIs - graphics, database, network, filesystem, audio, video and more, while maintaining the security expected of browsers. Libraries are a sort of tricky thing because they only get built when they're needed, but the third party and open source libraries available to JavaScript are growing more rapidly than for any other language - at least if you're looking at Github. Additionally, with the advent of Emscripten and asm.js, we'll likely start seeing more serious reuse of C/C++ libraries. Performance is obviously something that has gotten a lot of attention, and while not yet there for many use cases, the bar for what can be a web app is going up rapidly.

As for the last one - its not your favorite language. Obviously, unless JavaScript is your favorite language, it won't be your favorite language. But the ability to compile to JavaScript has never been easier, and it is a priority for the Ecmascript committee and browser makers. SourceMaps are gaining traction, and things like new module loader api are making sure that there are hooks for compilation in the browser. Compiling to JavaScript is becoming a reality. For Java, you've got GWT. ClojureScript is actually getting some production use and the reviews are pretty positive. Kotlin, while not even ready for the JVM, already has some support for JavaScript. It is being developed by Jetbrains in tandem. And believe it or not, even scala is getting love with Scala+GWT. There are also plenty others. TypeScript is well suited to the platform but offers better type safety more comfortable for Java devs. Dart is also pretty Java friendly.

I can see the complaints, and honestly, I've wondered about bytecodes as the answer myself, but it really is a pretty naive hope upon further research. The road through JavaScript is just going get to multi-language faster than trying to start over with bytecodes would. Sure, its not quite there yet, but the universal web bytecode approach hasn't even left the gate.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

11 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT