BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Evolving Java Without Changing the Language

Evolving Java Without Changing the Language

In "The Feel of Java" James Gosling stated that:
Java is a blue collar language. It's not PhD thesis material but a language for a job. Java feels very familiar to many different programmers because I had a very strong tendency to prefer things that had been used a lot over things that just sounded like a good idea.

The extraordinary success of Java offers weight to the notion that this was a sensible approach, and if it remains an important goal for Java today, then it makes sense that the language should continue to evolve relatively slowly. In addition to this, the fact that Java is a mature, widely used language causes its evolution to be fraught with difficulty. For one thing, each feature added to the language can change the way it feels in subtle and often unpredictable ways, risking alienating developers who have already adopted it as their language of choice.  For another, a feature that makes perfect sense on its own may interact with other features of the language in awkward or unexpected ways. Worse, once a language feature has been added it is all but impossible to remove even if it turns out to be detrimental to the language as a whole.  To justify adding a new feature, a language designer must be highly confident that it will be of long term benefit to the language rather than a short term or fashionable solution to a problem that rapidly becomes redundant. To mitigate the risk a language designer will typically experiment by creating a separate language or branch, such as the Pizza language used to experiment with Java's generics, prior to their implementation. The problem with this approach is that the audience for such experiments is both small and self-selecting; obviously they will all be interested in language features, and many may be academics or researchers. An idea which plays well to such an audience may still play badly when it is incorporated into the main language and general programmers start to work with it.

To get a sense of this, consider the closures debate that became so heated for Java 7. Implementations for the main proposals (and some others) have been available for some time but no consensus has emerged. In consequence Sun decided that JDK 7 will not get full closures support. The core argument came down to whether Java had become as complex as it could afford to be when generics (and in particular the wildcard syntax) were added to Java 5; and whether the addition of full support for closures was justified when Java already has a more limited form through anonymous inner classes. Two important use cases for adding full closures support were to simplify working with the fork/join API that is being added to JDK 7 to improve multi-core programming, and to help with resource clean-up.  Josh Bloch's ARM block proposal, which is now expected to be in JDK 7 via Project Coin, offers an alternative solution to the latter problem.  Dr. Cliff Click's research on a scalable, non-blocking programming style for Java offers an alternative approach to fork/join that may be more appropriate as the number of processor cores increases. If this were to happen, then the uses for closures in Java may arguably be too limited to justify their inclusion.

It remains important though that a programming language continues to develop at some level. This article therefore examines three alternative techniques for adding new language features to Java that don't require changes to the language itself - using a custom Domain Specific Language, exploiting the Java 6 annotation processor to add optional language features via a library, and moving the syntactic sugar from the language to the IDE. Each offers the potential to allow a wide audience of mainstream developers to experiment with the new features over the medium term in a non-invasive manner, and the best ideas can then filter down for inclusion in the core language.

Custom DSLs

The most widely discussed of the three is the Domain-Specific Language or DSL. There is some disagreement on exactly what the term means, but for the purposes of this discussion we'll refer to it simply as a language that has been created with a narrow focus to solve a particular problem, rather than as a general purpose language designed to solve every computing problem. As such we would expect a DSL to be non-Turing complete and for the most part this is the case. There are edge cases of course. Postscript, for example, is a Turing complete language but also qualifies as a DSL using our definition.

As the above example also illustrates, the idea of a DSL is not new. Other familiar DSLs include Regular Expressions, XSLT, Ant, and JSP, all of which require some sort of custom parser to process them. Martin Fowler also suggests that fluent interfaces/APIs can be considered a second type of DSL, which he refers to as an internal DSL. His definition is that an internal DSL is developed directly within the host language. This was a common practice amongst both Lisp and Smalltalk programmers, and more recently the Ruby community has been popularising the technique.  

Whilst many well-known DSLs are commercially developed and maintained, some enterprise development teams have used the technique to create a language that allows them to rapidly explore aspects of their problem domain. It isn't however as common as it might be, perhaps because DSLs have a fairly intimidating barrier to entry. The team has to design the language, build the parser and possibly other tools to support the programming team, and train each new developer that joins the team on how the DSL works. Here the emergence of tools to specifically support DSL development could significantly change the landscape. Intentional Software's Intentional Domain Workbench, which has been in development longer than Java has been around, is the first significant implementation of such a tool. The project started life at Microsoft Research, and Dr. Charles Simonyi's 1995 paper "The Death of Computer Languages, the Birth of Intentional Programming" describes his vision. In 2002 Simonyi founded Intentional Software to continue working on his ideas and a hugely impressive video demo of the system is available. The product itself is at 1.0 status, but access is restricted to very limited partners.

Other software houses are also exploring the concepts, amongst them JetBrains, well respected for their IntelliJ IDEA Java IDE, who have recently released the 1.0 version of their Meta Programming System (MPS). MPS doesn't use a parser, instead working with the Abstract Syntax Tree (AST) directly. It provides a text-like projectional editor which allows the programmer to manipulate the AST, and is used to write languages and programs. For each node in the tree a textual projection is created - as the programmer works with the projection, the change is reflected in the node. This approach allows you to extend and embed languages in any combination (often referred to as language composing) promoting language re-use. JetBrains are using the product internally and have recently released YouTrack, a bug tracking product developed using the system.

The Java 6 Annotation Processor

Whilst DSLs are less common in more mainstream languages such as Java than they are in Ruby, Smalltalk and Lisp, recent developments in the Java language, in particular the annotation processor which was added in Java 6, offer new possibilities for developers looking to use them in Java. The JPA 2.0 criteria API that will ship as part of Java EE 6, itself a DSL, offers an example. Here the annotation processor builds up a metamodel type for each persistent class in the application. Whilst it would be perfectly possible for the developer to hand craft the metamodel in Java, it would be both tedious and error prone. The use of the annotation processor eliminates that pain and, since the annotation processor is built into Java 6, the approach requires no specific IDE support – an IDE delegates to the annotation processor triggered by the compiler, and the metadata model is generated on the fly. 

Using the annotation processor it is also possible for a library to add a new language feature. Bruce Chapman's prototype "no closures" proposal, for example, uses the technique to provide a mechanism for casting a method to a Single Abstract Method (SAM) type which compiles on top of Java 6. During our conversation Chapman pointed out that the SAM type also supports free variables, a key aspect of a closure: 

The method body can declare additional parameters beyond those required for the Single Abstract Method using the @As.Additional annotation. These parameters can have values bound to them at the point where you obtain an instance of the SAM type, and are then passed to the method each time it is invoked.

Chapman also set up the Rapt project to explore other uses of the technique, and has added implementations for two language changes - Multiline Strings and XML literals - that were considered for JDK 7 but won't now make it into the final release. Java could even get a form of closures support using this approach.  When asked about this, Chapman said:

We are just finishing a Swing project which we used it for. We have found a couple of minor bugs around generic types, one recently discovered remains to be fixed but other than that it seems quite nice to use, and nobody has been wanting to rush back to use conventional anonymous inner classes.

Project Lombok, another project exploring the the annotation processor, pushes the technique still further. In effect Lombok uses annotation processing as a hook to run a Java agent that re-writes various javac internals based on the annotations. Since it is manipulating internal classes it is probably not suited to production use (internal classes can change even between minor releases of the JVM) but the project is an eye-opening example of just what can be done using the annotation processor, including:

  • Support for properties using a pair of @Getter and/or @Setter annotations with varying access levels, e.g. @Setter(AccessLevel.PROTECTED) private String name;
  • The @EqualsAndHashCode annotation, which generates hashCode() and equals() implementations from the fields of your object
  • The @ToString annotation, which generates an implementation of the toString() method
  • The @data method, which is equivalent to combining @ToString, @EqualsAndHashCode, @Getter on all fields, and @Setter on all non-final fields along with a constructor to initialize your final fields

Other language experimentation, such as removing checked exceptions from Java, can also be done using this approach.

Whilst the annotation processor technique opens up a welcome new route to language experimentation, care needs to be taken that the generated code can be easily read by developers, not just by the machine. Chapman made a number of suggestions during our conversation:

Generate source code not bytecode, and pay attention to formatting (indenting especially) in the generated code. The compiler won't care whether it is all on one line or not, but your users will. I even sometimes add comments and javadoc in the source code generated by my annotation processors where appropriate.

Hopefully if the technique becomes more prevalent IDEs will also make it easier to view the code that is to be generated at compile time.

Syntactic Sugar in the IDE

Bruce Chapman also touches on our third technique - moving the syntactic sugar from the language to the IDE - in his blog and he elaborated on his ideas during our conversation. It is already routine for Java IDEs to create portions of boilerplate code for you such as the getters and setters of a class, but IDE developers are beginning to push the concept further. JetBrains' IntelliJ 9 offers a terse code block syntax for inner classes similar to a closure, which a developer can also type. Acting like code folds, these can then be expanded into the full anonymous inner classes which the compiler works with - this allows developers who prefer to stick with the standard anonymous inner class syntax to do so. A similar plug-in for Eclipse also exists. The key point here is that the "alternate" syntax is just a view of the actual code which the compiler and any source management tools continue to work with. Thus the developer should be able to switch views between either form (like expanding or collapsing a code fold), and anyone without access to the definition of the sugar just sees the normal Java code. Chapman writes:

There are many details to work out in order to make this easily accessible, but long term I see developers relatively easily defining a two way sugaring/desugaring transformation (jackpot is a good start for how this might be done), trying them out, evolving them and sharing the good ones with colleagues and the community. The advantages of this are almost the same as for a language change, without the disadvantages. The very best could become ubiquitous and then form the basis of an actual language change if necessary to get rid of any remaining "noise" not possible with this approach.

Since syntactic sugar has to map to another (more verbose) language feature it cannot offer complete closure support; there are some features of BGGA closures for example that cannot be mapped to anonymous inner classes, and so they couldn't be implemented through this approach. Nevertheless the idea opens up the possibility of having various new syntaxes for representing anonymous inner classes, similar to BGGA syntax or FCM syntax, and allowing developers to pick the syntax they want to work with. Other language features, such as the null-safe Elvis operator, could certainly be done this way. To experiment further with the idea this NetBeans module also developed by Chapman, is what he describes as a "barely functional" prototype for Properties using this approach.

Conclusion

In language development there is always a trade-off between stability and progress. The advantage that all of these techniques bring is that they don't affect the platform or the language. In consequence they are more tolerant to mistakes and are therefore more conducive to rapid and radical experimentation. With developers freely able to experiment we should begin to see more people separately tackling the poor signal to noise ratio of some common boilerplate such as the anonymous inner class syntax, mixing and evolving these ideas to some optimum form that adds the most value in the most cases. It will be fascinating to see how developers use these different approaches to push the Java platform in new directions.

Rate this Article

Adoption
Style

BT