Rubinius Progress - Interview with Brian Ford
We talked to Brian Ford (brixen on the IRC) about what's been going on in the Rubinius project.
Brian lists some of the changes in Rubinius that have happened in the past few months:
[..] There's been hundreds of commits and thousands of lines of changes in the past two months. A couple highlights:
* Evan has added a JIT framework that can be enabled, along with a dynamically generated bytecode interpreter.
* Contributors have fixed and improved performance for a number of Ruby core library classes.
* We've reworked our bootstrap process to improve the quality of code in the core library.
* We've got a working instrumenting profiler that produces the same output as MRI's but better than 10x faster.
* Adam Gardiner has been getting the Ruby debugger working again on the new VM.
* I've update our FFI implementation to be closer to that released by JRuby and the MRI FFI gem.
* I've completely redone our compiler specs in a much better format.
* The Rubyspecs for 1.8 and 1.9 are merged and I've added significant features to MSpec to facilitate running the specs for 1.8 and 1.9. Keep in mind that Engine Yard, through my work, continues to be the major financial backer of the RubySpecs which benefits *every* Ruby implementation out there, MRI included.
LLVM, a framework for building compiler backends, has been generating interest for some time now. There Ruby bindings for LLVM which have already yielded a few interesting projects. Brian explains what's going on in Rubinius and LLVM:
LLVM is not currently turned on by default. We are exploring all options for native code generation. Evan has been working on a JIT assembler in C++. This is obviously a time-critical component. The gain from generating native code is directly offset by the time taken to generate the code at runtime.
LLVM is a huge and impressive library that can take C/C++ source code through all manner of cutting edge (and old school) optimizations to arch-specific machine code for a number of processors. We're still exploring how to best leverage its impressive power. One possibility is to generate LLVM IR during compilation and defer machine code generation till runtime.
The other thing to realize is that optimizations give a speed benefit correlated to the amount of code that can be optimized. Ruby is a language in which method calls play a very large role. Unless we can aggressively inline code, optimizations have very little to work with. We're exploring a more capable runtime type system that will enable effective inlining.
A big part of last year was spent on a major rewrite of the old VM ("shotgun", written in C) to the new C++ based VM. Brian explains that Rails will soon be supported (again):
Not to the same level that it worked with shotgun. With the new VM, a lot of the fundamentals underlying the core library were rewritten. In particular, we have an issue with our Autoload implementation at the moment that is a blocker for running Rails/Merb. We're focusing on all these issues in Q1. We plan to have the significant macro and micro web frameworks running including Rails/Merb and ramaze, camping, Sinatra, waves... any I'm missing?
My responses here are a bit controversial, but I believe they are well supported by the evidence. There is no benefit to using RubyParser. It is the exact same technology (LALR(1), read hard and not accessible to the vast majority of programmers) as used by the MRI native parser, which we've used in Rubinius since the project was started. RubyParser is significantly slower and introduces an unnecessary incompatibility point relative to MRI.
We will eventually make the execution or Ruby code fast enough that using something like RubyParser will be conceivable. But until then, there is no benefit. Parsing is the Achilles' heel of Ruby implementations and for us it is a non-issue. We can use the MRI source almost directly.
If folks are really interested in parsing Ruby, for pete's sake explore a useful technology like PEGs (implemented by Treetop) that will give us composable grammars and truly open up grammars and parsing to lay programmers.
Evan Phoenix gave an overview of some Rubinius design improvements at RubyConf '08. Brian also summarizes the changes:
There are some big areas that we are exploring but they basically reduce to better compiler technology, better GC, better data structures, and better type system. Of course, this is iterative. But keep in mind that Rubinius is a project that is barely 2 years old. Evan has made outstanding architecture decisions throughout the life of the project and we've got an excellent foundation in the new C++ VM, Ruby compiler, and Ruby core library.
The Rubinius project has already brought many improvements to the Ruby space; Brian already mentioned the RubySpecs which are used by all Ruby implementations nowadays. Another library that originated in Rubinius is the Ruby FFI library:
First of all, while the push into FFI was started by Rubinius, the JRuby folks deserve credit for getting the MRI gem out there. It's a win-win solution for all Ruby implementations, but especially ones like JRuby who cannot ever support Ruby C-API extensions like MRI and Rubinius can.
We chose to implement FFI because we had an arguably better API than DL and because it is very tied to the implementation. All the implementations can agree on the API that is provided to Ruby code, but there is almost nothing in the FFI implementation itself that can be shared.
Rubinius' FFI and Ruby FFI differ in the support of callbacks:
They are not a huge priority right now, but we will add them. Rubinius is the only alternative implementation that offers Ruby C-API compatibility to C extension authors. We don't have to use FFI for everything because, for example, you can recompile ImageMagick with our ruby.h and you're done (not that ImageMagick will necessarily work right now since we're still working on the C-API, but for example we use the MRI Readline C extension directly in Rubinius).
See the Rubinius tag on InfoQ for more more information on Rubinius.