Java 7 Features Which Enable Java 8
It's a truism of the tech industry that developers are never happier than when there's free beer or an opportunity to complain about something on offer.
So despite the efforts of Mark Reinhold and the Java team to involve the community in the roadmap after the Oracle acquisition (the Plan A / Plan B decision), many Java developers feel that Java 7 was "not much of a release".
In this article, I'll try to refute this thesis, by exploring the features in Java 7 which lay the groundwork for the new features in Java 8.
Java has often been criticised for being overly verbose. One of the most common areas where this complaint is expressed is in assignment. In Java 6, we are forced to write assignment statements like this:
Map<String, String> m = new HashMap<String, String>();
This statement contains a lot of redundant information - we should be able to somehow have the compiler figure out more of this by itself, and not require the programmer to be quite so explicit.
In fact, languages like Scala do a large amount of type inference from expressions, and in fact assignment statements can be written as simply as this:
val m = Map("x" -> 24, "y" -> 25, "z" -> 26);
The keyword val indicates that this variable may not be reassigned to (like the keyword final for Java variables). No type information is specified about the variable at all - instead the Scala compiler examines the right-hand side of the assignment and determines the correct type for the variable by looking at which value is being assigned.
In Java 7, some limited type inference capabilities were introduced, and assignment statements can now be written like this:
Map<String, String> m = new HashMap<>();
The key differences between this and the Scala form is that in Scala, values have explicit types, and it is the type of variables that is inferred. In Java 7, the type of variables is explicit, and type information about values is what is inferred.
Some developers have complained that they would have preferred the Scala solution, but it turns out to be less convenient in the context of a major feature for Java 8 - lambda expressions.
In Java 8, we can write a function which adds 2 to an integer like this:
Function<Integer, Integer> fn = x -> x + 2;
The interface Function is new with Java 8 - it resides in the package java.util.function, along with specialized forms for primitive types. However, we've chosen this syntax as it is very similar to the Scala equivalent and allows the developer to see the similarities more easily.
By explicitly specifying the type of fn as a Function which takes one Integer argument and returns another Integer, then the Java compiler is able to infer the type of the parameter x - Integer. This is the same pattern that we saw in Java 7 diamond syntax - we specify the types of variables, and infer the type of values.
Let's look at the corresponding Scala lambda expression:
val fn = (x : Int) => x + 2;
Here, we have to explicitly specify the type of the parameter x, as we don't have the precise type of fn, and so we have nothing to infer from. The Scala form is not hugely difficult to read, but the Java 8 form has a certain cleanliness of syntax which can be directly traced back to the diamond syntax of Java 7.
Method Handles are simultaneously the most important new feature of Java 7, and the one which is least likely to feature in the day-to-day life of most Java developers.
A method handle is a typed reference to a method for execution. They can be thought of as "typesafe function pointers" (for developers familiar with C/C++) or as "Core Reflection reimagined for the modern Java developer".
Method handles play a huge part in the implementation of lambda expressions. Early prototypes of Java 8 had each lambda expression converted to an anonymous inner class at compile time.
More recent betas are more sophisticated. Let's start by recalling that a lambda expression (at least in Java) comprises a function signature (which in the method handles API will be represented by a MethodType object) and a body, but not necessarily a function name.
This suggests that we could convert the lambda expression into a synthetic method which has the correct signature and which contains the body of the lambda. For example, our example:
Function<Integer, Integer> fn = x -> x + 2;
is turned by the Java 8 compiler into a private method with this bytecode:
private static java.lang.Integer lambda$0(java.lang.Integer); descriptor: (Ljava/lang/Integer;)Ljava/lang/Integer; flags: ACC_PRIVATE, ACC_STATIC, ACC_SYNTHETIC Code: stack=2, locals=1, args_size=1 0: aload_0 1: invokevirtual #13 // Method java/lang/Integer.intValue:()I 4: iconst_2 5: iadd 6: invokestatic #8 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer; 9: areturn
This has the correct signature (takes in an Integer, and returns another one), and semantics. To use this lambda expression, we will take a method handle which refers to it and use it to build an object of the appropriate type, as we'll see in the next feature we discuss.
The final feature of Java 7 that opens the door for Java 8 is even more esoteric than method handles. This is the new bytecode invokedynamic - the first new bytecode to be added to the platform since Java 1.0. This feature is almost impossible for Java developers to make use of in version 7, because version 7 javac will not, under any circumstances, emit a classfile which contains it.
Instead, the bytecode was designed for use by developers of non-Java languages, such as JRuby, which require much more dynamic dispatch than Java. To see how invokedynamic works, let's discuss how Java's method calls are compiled into bytecode.
A standard Java method call will be turned into a piece of JVM bytecodes which is often referred to as a call site. This comprises a dispatch opcode (such as invokevirtual, for regular instance method calls) and a constant (an offset into the Constant Pool of the class) which indicates which method is to be called.
The different dispatch opcodes have different rules that govern how method lookup is done, but until Java 7 the constant was always a straightforward indication of which method was to be called.
invokedynamic is different. Instead of providing a constant which directly indicates which method is to be called, invokedynamic instead provides an indirection mechanism that allows user code to decide which method to call at runtime.
When an invokedynamic site is first encountered, it does not have a known target yet. Instead, a method handle (called a bootstrap method) is invoked. This bootstrap method returns a CallSite object, which contains another method handle, which is the actual target of the invokedynamic call.
1) invokedynamic site encountered in the execution stream (initially unlinked) 2) Call bootstrap method and return a CallSite object 3) CallSite object contains a method handle (the target) 4) Invoke the target method handle
The bootstrap method is the way in which user code chooses which method needs to be called. For lambda expressions, the platform uses a library-supplied bootstrap method called a lambda meta-factory.
This has static arguments which contain a method handle to the synthesized method (see last section), and the correct signature for the lambda.
The meta-factory returns a CallSite that contains a method handle which will in turn return an instance of the correct type that the lambda expression has been converted to. So a statement like:
Function<Integer, Integer> fn = x -> x + 2;
is converted to an invokedynamic call like this:
Code: stack=4, locals=2, args_size=1 0: invokedynamic #2, 0 // InvokeDynamic #0:apply:()Ljava/util/function/Function; 5: astore_1
The invokedynamic bootstrap method is the static method LambdaMetafactory.metafactory(), which returns a CallSite object which is linked to a target method handle, which will return an o bject which implements the Function interface.
When the invokedynamic instruction is complete, an object which implements Function and which has the lambda expression as the contents of its apply() method is sat on top of the stack, and the rest of the code can proceed normally.
Getting lambda expressions into the Java platform was always going to be a challenging endeavour, but by ensuring that the proper groundwork was in place, Java 7 eased that effort considerably. Plan B not only provided developers with the early release of Java 7 but also allowed core technologies to be made fully road-tested before their use in Java 8 and especially in lambda expressions.
About the Author
Ben Evans is the CEO of jClarity, a startup which delivers performance tools to help development & ops teams. He is an organizer for the LJC (London JUG) and a member of the JCP Executive Committee, helping define standards for the Java ecosystem. He is a Java Champion; JavaOne Rockstar; co-author of “The Well-Grounded Java Developer” and a regular public speaker on the Java platform, performance, concurrency, and related topics.
in Scala, values have explicit types, and it is the type of variables that is inferred. In Java 7, the type of variables is explicit, and type information about values is what is inferred.
Umm. Scala's type inference is not perfect by any stretch, but it certainly handles this:
val fn: Int => Int = x => x + 1
Re: Type Inference
InvokeDynamic: incredibly complex
A point of comparison: in a 'native' implementation of functional languages, a function call (including a call to a lambda) can be done with a single machine instruction. The complete entry from beginning of call site to executing the body can be done in a small handful of x86 instructions.
I would suggest that the only reason for considering InvokeDynamic is because the JVM class loader is so slooow that small class files represent a bottleneck.
Re: InvokeDynamic: incredibly complex
I don't think your hypothesis is based on reality.
First, very few calls can be done with "a single machine instruction", even in assembly.
Second, higher level languages use indirection and varying levels of polymorphism, some of which can be resolved at compile time, and some of which is necessary to resolve at runtime. What invokedynamic provides is a means to address the latter, with some flexibility between whether every call with have the same destination, or each call will need to be evaluated individually to resolve the destination.
Lastly, invokedynamic is designed such that the native code can (in the end) be the same as if you had written it in assembly, i.e. no more complex than necessary. In other words, if the destination is resolvable and unchanging, the function call can be as close to your mythical "single machine instruction" as possible, and in fact the JVM could even inline the function.
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.
Anatole Tresch Mar 03, 2015