Evan Phoenix on Rubinius - VM Internals Interview
Whereas other alternative Ruby implementations such as JRuby, XRuby, Gardens Point Ruby.NET, IronRuby, target existing VMs or, as Ruby 1.x, are written in C, rubinius uses another approach, taking ideas from the Smalltalk VMs, particularly Squeak Smalltalk. Squeak is written in a subset of itself. This subset, called Slang, is basically C with a Smalltalk syntax and a few restrictions. One of the goals of rubinius is to use this approach throughout the project as well. The language for this is called Garnet (formerly called "Cuby"), and work on it is under way. Evan explains the current status:
It's still something I plan on doing sooner rather than later. There ended up being a lot of issues we wanted to tackle first, and we haven't get got back to working on Garnet (the new name for Cuby). There were no particular problems yet, but I'm sure well find some.Currently, basic pieces of the VM are written in C, also using an approach from Smalltalk. Evan explains the idea behind primitives:
Garnet looks like ruby at first glance, but the semantics of what things mean have been rewired.
For example, in garnet code 'd = c.to_ref' appears to call a method called to_ref on c, but garnet will translate that into 'd = &c', which is C code. One way to think about it is as a really advanced C preprocessor. It tries to map as much as it can to C constructs. The idea is something that looks like ruby, but behaves like C.
They're small chunks of C code that can be called to from Ruby. They're used to implement things that really can not be done in Ruby. A great example is the ability to allocate an object. On the backend, this interacts with the garbage collector to find enough memory to allocate the specific object in. That operation can not be in Ruby, it's an operation at the bottom of the stack, thus a primitive operation.To show how this actually looks, here an example of a rubinius primitive:
Rubinius primitives are exactly the same as Smalltalk's primitives. If a method is assigned a primitive operation number, then when the method is run, the primitive is invoked instead of the normal ruby code. If the primitive fails (the primitive itself indicates that it has failed), then the ruby code is run, as fallback behavior. This ruby code can do things like raise an exception about why it failed, convert the arguments and try again, or any number of things.
def fixnum_size(_ = fixnum)Evan runs us through the code:
The primitives and instructions use a kind of funny format to make maintaining them easier. All the operations are just ruby methods, where the body is a string that contains C code. At build time, these files are run and some code calls each method, collecting the C code and then spitting it out do a file which is #include'd into other C code. One of the primary reasons for this is primitives and instructions are wrapped in a huge switch statement in C, which is a pain to maintain by hand.
Also, it gives us some preprocessor capabilities. For example, here you see that we've put a little magic sauce in the argument definition the fixnum_size primitive. First, the point of that code is to automatically register that the C code "POP(self, FIXNUM_P)" should be run before code in the method. The code that runs the methods and outputs C code will notice that and write that out properly. We use this form to make the primitives easier to write.
I should note that not all primitives use this form currently. Soon, we'll be doing an audit and switching out all of them to use it.
As can be seen, the borders between Ruby and C code are fuzzy and are going to shift in the future, as more of Garnet becomes available. Further up the stack, Ruby is used for implementing standard libraries.
The core VM (which is really just the opcodes) is currently written in C, as are the primitives. The primitives are the first thing I want to use Garnet on. Both garbage collectors are also written in C (though the amount of code they consume is small).
Other than that, everything is in Ruby. Everything from parsing the interpreter command line arguments (things like rubinius -d -v, which turn on debugging and warnings) to all the methods on String. The complete runtime environment of code is can easily be manipulated, as it's just Ruby.
Rubinius has made great strides in the past. To keep this up and to get more developers to start coding or testers/users to start banging away at the existing VM, some more information about the project is needed.
At the moment, the IRC channel #rubinius is the best source for information (besides, of course, the source). Some historic digests of this channel can be found online too. Evan details the plans to improve the transparency of the project:
We're really work hard to make the process more transparent. Currently, the IRC channel is the primary way we communicate, and we're actually working to integrate IRC logs into http://rubini.us. We also encourage people to use the forums on http://rubini.us to ask questions. The forum has RSS feeds setup and most of the devs watch them for discussions (though I have to admit, this hasn't really caught on yet).
Any suggestions people have about making the project more transparent, I'm all ears. We can always use people writing specs. Once we have a complete spec suite, the rest falls in place pretty easily.
Watch out for part two of this interview, which will feature a more technical look behind the scenes of the new debugging іmplementation, GC and ObjectSpace and threading.