JRuby's New IR Paves the Way for Future Performance Improvements
JRuby has made great progress with regards to performance. A year ago, JRuby was the fastest implementation for Ruby 1.8 code. JRuby performance has been raised with every release, and further improvements are on the horizon with the new intermediate representation (IR) that is beeing developed by Subbu Sastry.
InfoQ talked to Subbu to learn how this new IR looks and what benefits it will bring to JRuby.
Before talking about the new intermediate representation, how does the compiler currently do its work without the IR?
Strictly speaking, the AST (abstract syntax tree) is also an IR (intermediate representation) – since it is not quite the source, and not quite the target language. An IR, in the abstract, is a representation (in-memory data structures, output "assembly", etc.) that has been chosen to capture the semantics of the source language but that also best enables the implementation in place.
Given that JRuby started off as an interpreter (the best and easiest way to get a reference / prototype implementation of a language), and had to be fully compliant with the C MRI implementation which operates off the AST (and possibly other reasons), the AST makes sense as the IR of choice.
(The JRuby AST is the output of the parser and is just a tree representation of the source code, containing different kinds of nodes, for example for classes, methods, variables. It is also used by tools that analyze code, for example to provide automated refactorings.)
Over the last few years, performance concerns led to the development of a compiler that translated the AST into native bytecode so that interpretation overheads could be reduced and also so that HotSpot could get a go at optimizing (inlining, etc.) and compiliation down to hardware-native code. The compiler doesn't blindly compile everything though I think it can be deployed as an ahead-of-time (AOT) compiler. It kicks in whenever method execution counters trip over some threshold.
Tom Enebo adds that "the JIT threshold is just a method invocation counter. The counter can be configured (it defaults to 50 right now) by setting:
-J-D jruby.jit.threshold=<invocation count>
We have a couple of other tunables for the JIT for things like controlling how many methods we JIT".
So how does the new representation look like, compared to the AST?
The AST, while good for interpretation, is probably not the best choice of IR when it comes to implementation of "traditional" (and other Ruby-specific) compiler "optimizations". The new IR translates the AST into a series of instructions which are somewhat like high-level assembly. An instruction is simply an operation (branch, call, receive arg, return, etc.) that operates on a bunch of operands (which can either be variables, or fixnums, floats, arrays, closures, etc.). At the start, a lot of operations end up being method calls, but the expectation is that some of these will optimized away (inlined, etc.) into native (jvm-native, i.e.) operations.
(Take a look at this post on the JRuby mailinglist to get an idea how the IR looks like.)
The IR will also include a bunch of other data structures like control flow graphs (to represent the flow of control), class and method hierarchies, and later on, possibly static single assignment (SSA) forms for even better optimizations. The goal throughout has been to represent the source in canonical form while preserving as much information from the front end as possible. As the compiler passes run, the higher-level IR instructions might get broken down into lower-level IR (call can get broken into lookup, method table load, and dispatch) or other implicit information gets explicit (heap frame allocation, heap frame loads and stores) so that they can get further optimized.
As a bunch of "assembly" instructions, this new IR is also more linear in its representation – it is not a tree – but as noted above, there are other auxiliary data structures (graphs) that explicitly represent control flow, and sometimes data flow.
What other benefits does this new representation bring?
This new IR is more suited for implementation of traditional compiler optimizations. In addition, we are going to experiment with techniques that (a) inlines closures (b) inlines method calls (c) converts fixnum/float calls to native integer/long/float/double operations while unboxing them (d) for non- inlined closures, reduce heap load/store overheads .. and possibly others. These would be much harder to do with an AST.
In addition, the expectation is that this IR will also consume less memory (if implemented properly, of course) – with memory peaking only during optimization phases. As a canonical representation of operations, it might enable a simple interpreter implementation (based off the IR instructions) which now has to only work off a bunch of 10s of operations, as opposed to 100s of ast node types.
Do you already have a plan when the IR will be used in JRuby?
We expect that it'll be a while before the benefits of the new IR will materialize. I would rather not put my feet in my mouth and say when this will be, but it will happen when it will happen. For me, this is a fun project :-).
But, the first steps we are planning to take is to build an interpreter that is based off the new IR instruction operations rather than AST nodes. This will help give us confidence that we have covered our bases with respect to capturing all information that needs to be captured, test our hypotheses about improved interpreter performance (we could be really off base, after all), and providing a base for implementing profiling (that will in turn feed the compilation system as to where it needs to turn its attention to).
But, the IR is partly in place (there are still some AST nodes which haven't been handled), control flow graphs are in place, some local optimizations are in place, dataflow framework is in place, some dataflow analyses and passes are in place, closure heap load/store optimization is in place, and more things will be added in the coming months.
In short, I don't think there is any real alternative to building a tiered implementation on top of the jvm simply because of the semantic mismatch between ruby and the jvm. We cannot expect the jvm to optimize away all the dynamic semantics of ruby which is a different and lengthy discussion altogether.
The code for the IR can be found on GitHub.