JSR 292 and the Multi-lingual JVM
Both Microsoft, through .NET 4 and the DLR, and Oracle, through the Da Vinci Machine project and Java 7, are looking to improve support for alternative languages that target their respective virtual machines. In doing so they reflect a growing trend amongst language developers and implementers, who are increasingly using pre-existing runtime environments to host their languages since developing a new runtime from scratch represents such a significant investment. The strengths of the JVM platform, such as efficient garbage collection, a robust security model and the wide availability of the JRE, coupled with the extensive selection of existing libraries and tools, has seen the JVM platform widely adopted for this purpose, with around 240 different languages implemented on top of it.
But whilst there are many reasons why the JVM is an attractive platform for language implementers, Java bytecode was designed to serve the needs of only one language, Java itself. As such it represents a considerable challenge for developers working on implementations of dynamic or functional languages. A common issue arises around the method call site. The JVM imposes a number of restrictions on how it interconnects methods, favouring either static reference or dispatch through a class or interface; the receiver type must conform to the resolved type of the call site, and the call site must link, which means the resolved method always pre-exists. Invocations are statically typed and expect Java types. These restrictions fit well with Java but other languages have other, typically more relaxed, invocation rules.
Since Java is a general-purpose language, is it possible to use an object-based API to simulate additional options or to relax the rules of method invocation. In John Rose's detailed paper (PDF document) on InvokeDynamic he refers to this approach as a simulator method. Java's reflection API includes a well known example - java.lang.reflect.Method.invoke, and many other language specific equivalents also exist such as Clojure's clojure.lang.IFn.invoke or JRuby's DynamicMethod.call. Using this approach the JRuby team were able to achieve a Ruby implementation with roughly double the performance of the standard C version. This is also in essence the approach that Microsoft is taking with their DLR for .NET. But whilst the approach works it introduces considerable execution overhead and complexity, and Sun decided to go a step further. Rather than adding a standard simulator method through a library as Microsoft is doing, the Da Vinci Machine project aims to make changes at a JVM level. JSR 292, which is targeted for inclusion in Java 7, is set to be the first result of the project to make it through to standardisation.
JSR 292 primarily focuses on the needs of dynamic languages. It introduces two key new concepts - a new method invocation instruction, and method handles with a corresponding type transformer factory. It also extends the grammar for Java identifiers whose spellings can be any sequence of characters (referred to as "Exotic Identifiers").
InvokeDynamic and Method Handles
In Java 6 the bytecode specification has four method invocation instructions which correspond directly to Java method calls:
- invokestatic which is used to invoke class methods
- invokespecial which is used to invoke instance initialisation methods, when invoking methods in a superclass, and for private methods;
- invokevirtual, the normal method invocation for an instance method
- invokeinterface for interface methods
Of these, the virtual and interface calls might serve the purposes of a dynamic language, but they are not flexible enough for the dynamic language runtime to bind the call site to a destination at runtime if the call site is not aware of the destination's interface prior to the bind. JSR 292 therefore adds a fifth method invocation, invokedynamic, which acts as a marker indicating that a dynamic language runtime specific call occurs at this point. At the JVM level, an invokedynamic instruction is used to call methods which have linkage and dispatch semantics defined by non-Java languages.
Like the other four invocation instructions invokedynamic is statically typed, however an invokedynamic instruction is dynamically linked under program control using a method handle. A method handle is not an entirely new concept, being similar to SmallTalk's perform or Objective-C's performSelector. On his blog Rose provides a succinct description of what it does:
Given any method M that I am able to invoke, the JVM provides me a way to produce a method handle H(M). I can use this handle later on, even after forgetting the name of M, to call M as often as I want. Moreover, if I provide this handle to other callers, they also can invoke M through the handle, even if they do not have access rights to call M by name. If the method is non-static, the method handle always takes the receiver as its first argument. If the method is virtual or interface, the method handle performs the dispatch on the receiver.
A method handle will confess its type reflectively, as a series of Class values, through the type operation.
As well as binding methods to call sites late, dynamic languages do a large amount of automatic type conversion at runtime – converting a String of "12345" to an Integer for a given method, for instance. A dynamic language runtime developer can create a transform class with an adapter method suitable for being called through a method handle, but creating one for each possible signature and target would be unworkable. JSR 292 therefore introduces a factory for generating these adapter classes. JRuby lead Charles Nutter told us:
The MethodHandles API in JSR-292 provides the basic building blocks for writing simple "glue" between a caller and a target method. InvokeDynamic works by contacting your language or library when a dynamic call is made, to which you respond by providing a method handle (or chain of method handles) that connect caller and callee appropriately. For example, if in the process of making a dynamic call you need to reorder arguments, you'd insert a "permute arguments" handle into the chain. If you need to wrap the ultimate target with exception handling, you'd insert a "catch exception" handle. There's handles for accessing fields, calling other Java methods, splatting/spreading arguments (to and from "varargs" arrays), branching on a condition, and lots more.
If the available handles don't do everything you need, you can also extend JavaMethodHandle and implement the logic in Java yourself. This allows you to provide more complicated logic without decomposing everything into tiny pieces.
You can think of MethodHandles as "function pointers for the JVM" with the added bonus of representing more complicated call sequences in a form the JVM understands directly. In my opinion they're the coolest part of JSR-292.
Writing on his blog, Charles Nutter describes his early experiences with using InvokeDynamic in detail. Using an early implementation of InvokeDynamic, Nutter and his team were able to simplify their implementation considerably for no loss in performance. We asked him if InvokeDynamic was yet able to beat the raw JRuby implementation.
I need to preface this by explaining that long before invokedynamic, we (JRuby) had done a *lot* of work to get performance as high as possible. This involves dozens of tricks designed to coax the JVM into optimizing Ruby code like it would optimize Java code, and I don't believe any other dynamic language on the JVM has done as much work as we have. So for invokedynamic to beat us it has to do a lot of work.
That said, invokedynamic in recent builds of OpenJDK 7 *has* started to beat stock JRuby for dynamic call performance. It's not faster 100% of the time, and you have to tweak the JVM's builtin optimization thresholds a bit, but it is definitely getting there. This is also without us fully utilizing invokedynamic and method handles throughout JRuby; only certain types of calls use invokedynamic, and we're sure to see similar benefits from wiring up the rest of JRuby. That will probably happen over the summer, as the invokedynamic reference implementation's API settles down a bit.
These three major JVM features of JSR292 – InvokeDynamic, method handles and exotic identifiers, will have a corresponding source-code syntax implemented through a Project Coin proposal. This change will provide a mechanism for Java to interoperate with new JVM languages that rely on the InvokeDynamic bytecode instruction, and is also expected to reduce the dependency on bytecode manipulation techniques when using javac and the JVM as a platform for developing new programming languages and language runtimes.
Dynamic Invocation is supported by the static-only class java.dyn.InvokeDynamic:
Object x = InvokeDynamic.getMeSomething();
whilst a method handle to a virtual call can be retrieved using the class java.dyn.MethodHandle:
MethodHandle mh = ...;
It should be noted that both InvokeDynamic and MethodHandle expose a potential loophole for checked exceptions in that, whilst an InvokeDynamic or virtual call may generate a checked exception, there is no way to statically infer which exceptions may be thrown from a given call site at compile time. Whilst this further weakens the checked exception model in the Java language, the proposal wiki argues that it doesn't represent a new problem.
As Java programs are already constructed to cope with unchecked exceptions and error, the possibility of a missed catch of a checked exception is not considered to be a hazard. In fact, dynamic languages cannot be supported without some such relaxation of static exception checking.
The Coin proposal also offers direct support for exotic identifiers, introduced using a hash symbol and placed in quotes:
int #"strange variable name" = 42;
System.out.println(#"strange variable name"); // prints 42
An exotic identifier cannot be empty, so int #""; would be illegal. Certain "dangerous" characters, specifically / . ; < > [ ], are illegal in an exotic identifier unless they are preceded by a backslash character, even though this is not true for string or character literals. If a dangerous character is preceded by a backslash, the backslash is dropped and the character is collected. The compiler will reject a program that contains an exotic identifier if the escaped character would otherwise participate in a bytecode name forbidden by the JVM specification.
JSR 292 allows the JVM to combine bytecode, still the most efficient representation for static computations, and method handle graphs, which are a more efficient way to put together computations on the fly. As such, the new features of JSR 292 allow an easier mix and match approach of dynamic and static languages on the JVM, making it easier for developers to pick the right language for the job.
It is also reasonable to assume that future versions of the JVM will go further. Whilst JSR 292 itself has focused on features for dynamic languages, the Da Vinci Machine project has explored features that are more of a necessity for functional languages such as tailcalls, continuations and interface injection. It is likely that some of these will be standardised in future.
Not for dynamic languages
Re: Nice post!
Ben Linders May 28, 2015