Project Lambda from the Inside. An Interview with Brian Goetz
This past April Oracle announced the delay of the highly anticipated release of Java 8, until Q1 of 2014 as reported on InfoQ.
Mark Reinhold, Chief Architect of the Java Platform Group at Oracle said in his blog:
"the most important work that slipped past M6 is related to Project Lambda, the sole driving feature of the release.
We integrated the language and VM changes for Lambda late last year, but between all the moving parts involved and the security work it’s taken a bit longer than expected to put the finishing touches on the stream API and the related core-libraries enhancements."
InfoQ spoke to Oracle's Brian Goetz, JSR-335 spec lead about his observations of Project Lambda from the inside.
InfoQ: There is a lot of coordination on this project, from parallel collections, to the new Stream API's. Without giving away any secrets, can you give us some insight about the JSR-335 project from an insider's perspective? How is all of the coordination managed?
Brian: Indeed, the number of interactions between moving parts is intimidating, and there were a large number of discrete stakeholders: two interlocking expert group lists (one purely for language features, and a broader one including the members of the JSR-166 EG for library features), the OpenJDK community, and multiple component teams within Oracle's Java Platform Group. But the payoff was worth it; unlike with some earlier platform evolution efforts, where we had to implement everything via syntactic sugar in the compiler, JSR-335 was able to undertake a coordinated co-evolution of the language, libraries, and Virtual Machine (VM), yielding a much better overall result.
In my view, the key to succeeding here is to maintain a clear focus on the goals. Language features are not a goal unto themselves; language features are enablers, encouraging or discouraging certain styles and idioms. Even within the category of "add lambda expressions", there are often hidden goals that condition the approach. The BGGA proposal had an underlying goal of supporting control abstraction through libraries; CICE focused more on the more modest goal of alleviating the syntactic pain of inner classes; when lambdas were added to C#, many of the use cases were dictated by the needs of LinQ.
For the JSR-335 effort, we maintained a clear focus that language features are primarily a means to better libraries, such as bulk data-parallel operations on existing collections. Language features enable better libraries; better libraries enable simpler, clearer, less error-prone user code. So we gathered a catalog of user idioms we wanted to enable, such as bulk data-parallel operations on collections, like:int sumOfWeights = anArrayList.parallelStream() .filter(b -> b.getColor() == BLUE) .mapToInt(b -> b.getWeight()) .sum();We used these examples as touchstones to ensure we were moving towards our goals, and not just moving.
Implicit in this simple example is:
- The need to compactly encode behavior in expressions (code as data)
- The need to move control of iteration from the client (using language features like the for-loop) to libraries (internal iteration)
- The need to add new methods to existing types like List.parallelStream (interface evolution)
- The need for much greater type inference (so that the compiler can infer the types of the lambda formals)
- The need to be able to do parallel operations on *non-thread-safe* data structures
One of the biggest challenges of doing larger language features is the risk of feature creep. When you're making a big change already, there is always the temptation to sneak a few others in while you're at it. (We received hundreds of suggestions from the community along these lines, and had to say no to nearly all of them.) Having a clear set of goals about what this effort is about was key to being able to say "that may be cool, but that's out of scope."
Another key coordination tool was the use of formal mathematical models in the design of language features, such as the inheritance rules for default methods or for the behavior of type inference. Natural language is particularly poorly suited for discussing complex problems like these; misunderstanding are guaranteed to creep in. The use of formal models allowed us to discuss and understand the proposed semantics of a feature in a precise and unambiguous way, as well as provide a blueprint for specification, implementation, and testing.
InfoQ: It seems most of Project Lambda's activity is within OpenJDK. How has that improved community participation?
Brian: We've got a dedicated crew of early adopters on the lambda-dev mailing list that regularly download the latest builds (or build from source) and try out the new features. This experience-driven feedback is absolutely critical to the process. Early adopters find bugs and usability issues when the cost of making course corrections is low. The most valuable thing the community can do to help is: try the code out yourself, and report your experiences (positive or negative.)
InfoQ: Almost all of the JSR's since Java's inception have been strictly compile time changes, most notable exception being Invoke-Dynamic. Are there any byte code level changes involved in JSR 335?
Brian: The implementation of lambda expressions leans heavily on invokedynamic, introduced in Java SE 7. Having invokedynamic in the compiler-writers toolbox enabled us to avoid a lot of VM work we might otherwise be tempted to do. However, there are two features that we added that have VM impact -- default methods (the VM has to get involved in the inheritance, which ultimately affects the semantics of the invokevirtual, invokeinterface, and invokespecial bytecodes), and static methods in interfaces (mostly a relaxation of existing classfile restrictions.)
InfoQ: It looks like IntelliJ Idea has already implemented some JSR335 compatibility. I presume the IDE vendors got some preview edition to work with some time ago. Can you anticipate when we might start seeing support in NetBeans and Eclipse?
Brian: We made sure all the IDE vendors were represented on the EG, both to ensure that the features were supportable via tooling, and that the IDE vendors would have early access to the specs. NetBeans has had lambda-enabled builds for a long time; they have a head-start since NetBeans uses javac as its compiler. IntelliJ has significant JSR-335 support already, both in early-access builds and their shipping 12.x product. Of course, all of these are waiting for the ink to be dry on the spec; until then, nothing is final.
InfoQ: One of the exciting features of Lambda is the functional-language look and feel, with the Streams API's for piping and concurrent collections. Can you speak to that, how were they designed, implemented, and coordinated?
Brian: Much of this was working backwards from desired use cases of what we wanted users to be able to do easily. We looked at libraries from other languages, as well as Java libraries like LambdaJ, paying special attention to their "showcase" examples, and used these as a catalog of "requirements" while we explored what sort of model was needed to support them all. We also felt that it was critically important that the stream operations work both sequentially and in parallel, and that existing collections be usable as sources for streams -- even non-thread-safe collections as sources for parallel streams. We planned for three distinct iterations of the API design, each building on what we learned from the previous iteration.
InfoQ: Lambdas, Closures, there is some debate about what the difference is. What's your perspective?
Brian: I think getting worked up over whether lambdas as expressed in Java SE 8 are "real" closures or not is a pretty unconstructive activity. To me, the use of the term "real" here should be a syntax error; there are many different languages with closure-like constructs, with varying degrees of similarity and difference. Declaring that language X gets to define what a real closure is and any language that doesn't do it exactly that way isn't real is not helpful.
That said, there were hard decisions to make about whether to support features like mutable local capture, nonlocal control flow, exception transparency, and other features, and having left some of these out means certain use cases are harder to express naturally; these are tradeoffs between expressiveness and complexity, and we tried to spend our complexity budget where it had the most impact for developers. It's fair to talk about the pros and cons of each of these decisions, but framing this in terms of "real vs fake closures" is not a constructive way to have this dialog.
InfoQ: Closures seems to introduce a paradox with regard to interfaces: When we think of interfaces we think of a structure that may contain data and structure but is devoid of implementation. On the other hand Closures are actually implementation as data. So will Project Lambda allow us to implement functionality in an Interface by defining constant closure fields?
Brian: Code is data. (Godel taught us this almost 100 years ago.) Lambda expressions simply make it syntactically easier to express behavior that can easily be treated as data.
That said, the notion that interfaces are devoid of implementation is one that is changing with Java SE 8. To support interface evolution, we're allowing interfaces to provide methods with default bodies that can be inherited by classes, and we're also allowing static methods in interfaces. (Interfaces with default methods can be considered a form of stateless traits.)
InfoQ: Lambda introduces an "Optional" type. How does that help eliminate null references?
Brian: Our use of Optional is very limited; it is essentially restricted to being used as the return type for methods that might not return anything. Take a method like Map.get(key), which returns null if the specified key is not in the map. This has two serious defects. The first is that null might be a valid value for a map element (some maps allow this.) Now you can't tell the difference between "not there" and "mapped to null", and if you have to make a second call to containsKey() to determine which, you've invited a race condition. The second is that it is really easy to write code that forgets to check for null, making your code less reliable. (And the null checking also makes your code uglier.) Of course, it's too late to save Map.get(), but we don't have to keep making the same mistake.
An Optional can either describe a non-null value or can be explicitly empty; you can't blindly dereference the return value of an Optional-bearing method because the type system will prevent you. So you have to explicitly decide what to do if the optional is empty; Optional has methods for get() (which throws an exception if the Optional is empty), getOrElse(defaultValue), getOrThrow(Supplier<E extends Throwable>), etc. So you can explicitly but unobtrusively encode what you intend to do with absent values -- throw an exception, substitute a default value, etc.
That said, those that are familiar with Option in Scala are likely to be disappointed; this is not a deep change to the type system, it is simply a helper class in the library to more explicitly reflect "this method may not return anything" without coopting null for that purpose.
To follow the latest on Project Lambda, please visit the Project Lambda page on the OpenJDK website.
About the Interviewee
Brian Goetz is the Java Language Architect at Oracle, and is the specification lead for JSR-335 (Lambda Expressions for the Java Language.) He is the author of the best-selling book "Java Concurrency in Practice" and is a frequent presenter at major industry conferences.
Code as data and Godel
Re: Code as data and Godel
Re: Code as data and Godel
Re: Code as data and Godel
Re: Code as data and Godel
Re: Code as data and Godel
The duality between code and data long predates the dawn of computing. It featured prominently in the work of Church and Turing, whose work on the foundations of computation (1930s) were influenced by the work of Godel, who is credited with first coming up with the trick of encoding statements or formulas within a logical system into numbers that could be described within that system. Programming languages like Lisp, through functions like "apply" and "eval", raise the code-data duality into a first-class concept. Other languages require greater degrees of hoop-jumping; some languages make it so hard that we might be fooled into thinking that code and data are completely separate beasts.
The whole point is that users tend to think that code and data are different, but they're not, and the distinction doesn't even help us very much. It might even hurt us.
Dmytro Svarytsevych Oct 30, 2014