On their continuing quest to make Java more terse, the language architects at Oracle are exploring pattern matching as a new language feature. Brian Goetz, Java language architect at Oracle, and Gavin Bierman, programming language researcher at Oracle, spoke to InfoQ about the prospect of incorporating pattern matching into the Java programming language.
Motivation
The motivation for this research is to improve upon some common Java programming idioms. Consider the following:
if (obj instanceof Integer) {
int intValue = ((Integer) obj).intValue();
// use intValue
}
There are three operations at work:
- A test to determine if
obj
is of typeInteger
- A conversion that casts
obj
to typeInteger
- A destructuring operation that extracts an
int
fromInteger
Now consider testing against other data types within an if...else
construct:
String formatted = "unknown";
if (obj instanceof Integer) {
int i = (Integer) obj;
formatted = String.format("int %d", i);
}
else if (obj instanceof Byte) {
byte b = (Byte) obj;
formatted = String.format("byte %d", b);
}
else if (obj instanceof Long) {
long l = (Long) obj;
formatted = String.format("long %d", l);
}
else if (obj instanceof Double) {
double d = (Double) obj;
formatted = String.format("double %f", d);
}
else if (obj instanceof String) {
String s = (String) obj;
formatted = String.format("String %s", s);
}
...
While the above code is commonly used and easily understood, it can be tedious (repetition of boilerplate code) and provides a number of places for bugs to hide. The excess boilerplate also tends to obscure the business logic - for example the casting seems unnecessary and repetitive once instanceof
has already confirmed the data type of the instance.
Goetz and Bierman explain the overall aim of their proposed improvements:
Rather than reach for ad-hoc solutions, we believe it is time for Java to embrace pattern matching. Pattern matching is a technique that has been adapted to many different styles of programming languages going back to the 1960s, including text-oriented languages like SNOBOL4 and AWK, functional languages like Haskell and ML, and more recently extended to object-oriented languages like Scala (and most recently, C#).
A pattern is a combination of a predicate that can be applied to a target, along with a set of binding variables that are extracted from the target if the predicate applies to it.
Goetz and Bierman have been experimenting with various patterns that include new keywords such as matches
and exprswitch
.
Operator matches
A proposed matches
operator would eliminate the need for an instanceof
test. For example:
if (x matches Integer i) {
// use i here
}
The binding variable, i
, is only used if the variable, x
, matches an Integer
. Expanding the above to include the other data type within an if...else
construct eliminates an unnecessary casting.
Improvement on switch
Goetz and Bierman explain that “the switch
statement is a perfect “match” for pattern matching.” For example:
String formatted;
switch (obj) {
case Integer i: formatted = String.format("int %d", i); break;
case Byte b: formatted = String.format("byte %d", b); break;
case Long l: formatted = String.format("long %d", l); break;
case Double d: formatted = String.format("double %f", d); break;
default: formatted = String.format("String %s", s);
}
...
The above code is much easier to read and is much less cluttered. However, Goetz and Bierman point out a limitation of switch
- “It is a statement, and therefore the case arms must be statements, too. We’d like an expression form that is a generalization of the ternary conditional operator, where we’re guaranteed that exactly one of N expressions are evaluated.”
Their solution is to propose a new expression statement, exprswitch
.
String formatted =
exprswitch (obj) {
case Integer i -> String.format("int %d", i);
case Byte b -> String.format("byte %d", b);
case Long l -> String.format("long %d", l);
case Double d -> String.format("double %f", d);
default -> String.format("String %s", s);
};
...
A summary of the patterns proposed by Goetz and Bierman include:
- Type-test patterns (binds the cast target to a binding variable)
- Destructuring patterns (destructures the target and recursively match the components to subpatterns)
- Constant patterns (matches on equality)
- Var patterns (matches on anything and binds their target)
- The _ pattern (matches on anything)
Goetz spoke to InfoQ about pattern matching.
InfoQ: What kind of community response have you received since publishing your paper?
Goetz: We've received very positive responses; people who have used pattern matching in other languages really like it, and are happy to see it coming to Java. For folks who've not seen it before, we expect there will be more of an education effort as to why we think this is an important thing to add to the language.
InfoQ: How heavily has the design of Scala's match operator influenced the design so far? Are there any specific things that Scala's match can do which Java pattern matches will not be able to do?
Goetz: Scala is just one of the many languages that has informed our idea of what pattern matching in Java should be. Adding a feature to a language is rarely a matter of "porting" it from some other language; if we did that, it would look like a bag nailed on the side. We will likely end up in in a place where we don't do all the things Scala's pattern matching does -- and also doing some things that Scala's does not.
We think that there's an opportunity to integrate pattern matching much more deeply into the object model than Scala has. Scala's patterns are effectively static; they can't be easily overloaded or overridden. While that's still very useful, we think we can do better.
Deconstruction is the dual of construction; just as OO languages give you choices for how to construct objects (constructors, factories, builders), we think having the same choices in deconstruction will lead to richer APIs. While pattern matching has historically been associated with functional languages, we think it has a rightful place in OO too -- one that just has been historically overlooked.
It is easy to get excited about language features, but we think language features should primarily be enablers for better libraries -- because there's so much leverage in having a rich ecosystem of libraries. And pattern matching will definitely enable us to write simpler, safer libraries.
As an example of an OO library that never realized it needed pattern matching, consider the pair of methods in
java.lang.Class
:public boolean isArray() { ... } public Class getComponentType() { ... }
The second method has a precondition -- that the first method return true. Having the logic of an operation spread out over multiple API points means complexity for the API writer (who has to specify more, and test more) and complexity for the user (who can more easily get things wrong.) Logically, these two methods are one pattern, which fuses the applicability test of "does this class represent an array class" and the conditional extraction of the component type. If it were actually expressed as such, it would be easier to write, and impossible to use incorrectly:
if (aClass matches Class.arrayClass(var componentType)) { ... }
InfoQ: Is it a goal or a non-goal to provide implementation technology that would allow Scala to rebase its match on top of the Java version under development (in a similar way to how we saw Scala 2.12 rebase traits on top of interfaces)?
Brian Goetz: Just as with Lambda, we expect that, in the course of designing this feature, we will identity sensible building blocks that can go into the underlying platform, that multiple languages can benefit from -- and provider greater interoperability between multiple languages.
InfoQ: In the Scala implementation, significant amounts of additional synthetic bytecode are generated to support features like destructuring of case classes. Does adding a similar amount of VM and bytecode machinery here have any drawbacks?
Goetz: The risk is that the compiler is stepping outside its normal role, and assigning semantics to class members that are usually under the control of the developer. While this is convenient, it may not always be what the user wants -- which often leads to requests for dials and knobs for tweaking the generation (e.g., comparing arrays with
Arrays.equals()
rather thanObject.equals()
).
InfoQ: Will destructuring be limited to data classes?
Goetz: We plan to roll out pattern matching over multiple releases, starting with simple patterns like type tests, then destructuring patterns on data classes, and eventually arbitrary user-written destructuring patterns. So while we definitely do not intend to limit destructuring to data classes, there may be some window of time when that is true.
InfoQ: Can you speak to the relationship between data classes and value types?
Goetz: The two are almost completely orthogonal. Values are about aggregates that have no identity; by explicitly disavowing identity, the runtime can optimize in-memory layout, flattening away indirections and object headers, and more freely cache value components across synchronization points. Data classes are about disavowing complex, indirect relationships between a classes representation and its API contract; by doing so, the compiler can fill in common class members like constructors, pattern matchers,
equals
,hashCode
, andtoString
. A class might be suitable for being a value class, or a data class, or neither, or both; all combinations make sense.
InfoQ: Sealing requires some support from the source code compiler.
Goetz: Sealing (like finality) not only requires support from the compiler, but ideally would get support from the JVM too -- so that language-level constraints like "X can't extend Y" are enforced by the JVM.
InfoQ: Is the intention for sealing to mean something like "may not be subclasses outside this module"?
Goetz: There's a range of what sealing could mean. The simplest form would mean "can only be extended by other classes declared in the same source file" -- not only is this a common interpretation of sealing, but it is also extremely simple because things cannot get out of sync via separate compilation. One could also define sealing to mean "in the same package" or "in the same module"; one could get even crazier and allow lists of "friends," or complex runtime predicates. We will almost certainly land on the simple end of this spectrum; the return-on-complexity beyond the simplest interpretation drops off pretty quickly.
InfoQ: Will the new six-month Java release cycle better facilitate integrating pattern matching into the language?
Goetz: We hope so! We already have identified sensible "chunks" into which pattern matching can be divided, so that we can roll out simple pattern support relatively quickly, and keep building on that.
InfoQ: Will a prototype be available for pioneer users at some point?
Goetz: It already is -- for early adopters who are willing to compile a JDK from source. There's a branch in the "Amber" forest with support for type test patterns in switch and a "matches" predicate.
InfoQ: What’s on the horizon for your pattern matching research?
Goetz: We're still exploring how we want to surface matchers as class members, and the questions surrounding how they play into overloading and inheritance. There's lots to figure out there.
Resources
- Towards Pattern Matching in Java by Kerflyn, May 9, 2012
- Pattern Matching in Java by Benji Weber, May 3, 2014
- Pattern Matching in Java with the Visitor Pattern by Kevin Peterson, February 11, 2015
- Adventures in Pattern Matching by Brian Goetz, JVM Language Summit, August 2017
- Moving Java Faster by Mark Reinhold, September 6, 2017
- Java to Move to 6-Monthly Release Cadence by InfoQ, September 6, 2017