InfoQ Homepage Articles Java Feature Spotlight: Pattern Matching

Java

Java Feature Spotlight: Pattern Matching

This item in japanese

Jan 22, 2021 18 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

Java SE 14 (March 2020) introduced a limited form of pattern matching as a preview feature, which becomes a permanent feature in Java SE 16 (March 2021).
The first phase of pattern matching is limited to one kind of pattern (type patterns) and one language construct (instanceof), but this is only the first installment in a longer feature arc.
At the simplest level, pattern matching allows us to reduce the ceremony of conditional state extraction, where we ask a question about some object (such as "are you a Foo"), and, if the answer is positive, we extract some state from the target: e.g. "if (x instanceof Integer i) { ... }" where i is the binding variable
Binding variables are subject to definite assignment, but take this one step further: the scope of a binding variable is the set of places in the program where it would be definitely assigned. This is called flow scoping.
Pattern matching is a rich feature arc that will play out over several Java versions. Future installments will bring us patterns in switch, deconstruction patterns on records, and more, with the aim of making destructuring objects as easy as (and more structurally similar to) constructing them.

Preview Features

Given the global reach and high compatibility commitments of the Java platform, the cost of a design mistake in a language feature is very high. In the context of a language misfeature, the commitment to compatibility not only means it is very difficult to remove or significantly change the feature, but existing features also constrain what future features can do -- today's shiny new features are tomorrow's compatibility constraints.

The ultimate proving ground for language features is actual use; feedback from developers who have actually tried them out on real codebases is essential to ensure that the feature is working as intended. When Java had multi-year release cycles, there was plenty of time for experimentation and feedback. To ensure adequate time for experimentation and feedback under the newer rapid release cadence, new language features will go through one or more rounds of preview, where they are part of the platform, but must be separately opted into, and which are not yet permanent -- so that in the event they need to be adjusted based on feedback from developers, this is possible without breaking mission-critical code.

Java SE 14 (March 2020) introduced a limited form of pattern matching as a preview feature, which becomes a permanent feature in Java SE 16 (March 2021).

The first phase of pattern matching is limited to one kind of pattern (type patterns) and one language construct (instanceof), but this is only the first installment in a longer feature arc.

At the simplest level, pattern matching allows us to reduce the ceremony of conditional state extraction, where we ask a question about some object (such as "are you a Foo"), and, if the answer is positive, we extract some state from the target.

Querying an object's type with instanceof is a form of conditional extraction, because invariably the next thing we do is cast the target to that type, extracting a reference to a narrowed type.

A typical example can be found in the copy constructor of java.util.EnumMap:

public EnumMap(Map<K, ? extends V> m) {
    if (m instanceof EnumMap) {
        EnumMap<K, ? extends V> em = (EnumMap<K, ? extends V>) m;
        // optimized copy of map state from em
    } else {
        // insert elements one by one
    }
}

The constructor takes another Map, which might or might not be an EnumMap. If it is, the constructor can cast it to EnumMap and use a more efficient way to copy the map state, otherwise it falls back to a generic approach.

The test-and-cast idiom seems needlessly redundant -- what else would we do right after learning m instanceof EnumMap? Pattern matching allows us to (among other things) collapse the test-and-cast into a single operation. A type pattern combines a type name with a declaration for a binding variable, which will be bound to the narrowed type of the target if the instanceof succeeds:

public EnumMap(Map<K, ? extends V> m) {
    if (m instanceof EnumMap<K, ? extends V> em) {
        // optimized copy of map state from em
    } else {
        // insert elements one by one
    }
}

In the example above, EnumMap<K, ? extends V> em is a type pattern. (That it looks like a variable declaration is no accident.) We extend instanceof to accept patterns as well as plain types; asking whether m matches this pattern means that we first test that it is an EnumMap, and if so, cast it to EnumMap and bind the result to em in the first arm of the if statement.

That we have to cast after the instanceof was always a bit of unfortunate ceremony, but the benefit of fusing these operations is not mere concision (though the concision is nice); it also eliminates a common source of error. It is an easy mistake to make to cut and paste an instanceof/cast pair, change the operand of instanceof, and forget to change the cast. Repetition like this gives bugs a place to hide; by eliminating their habitat, we can eliminate whole categories of bugs.

Another place where we routinely test-then-cast is in implementing Object::equals. An IDE might generate the following equals() method for a Point class:

public boolean equals(Object o) {
    if (!(o instanceof Point))
        return false;
    Point p = (Point) o;
    return x == p.x && y == p.y;
}

This code is straightforward enough, but the short-circuiting control flow does makes it slightly harder to follow what the code does. Here's the equivalent code using a pattern match:

public boolean equals(Object o) {
    return (o instanceof Point p)
        && x == p.x && y == p.y;
}

This code is just as efficient, but is more straightforward, because we can express the equality condition as a single compound boolean expression rather than as statements with ad-hoc control flow. The scoping for the binding variable p is flow-sensitive; it is only in scope where it would be definitely assigned, such as in the expressions conjoined with &&.

If all pattern matching did was eliminate 99% of the casts in Java code, it would surely still be popular, but the promise of pattern matching runs considerably deeper. Over time, there will be other sorts of patterns that can perform more complex conditional extractions, more sophisticated ways to compose patterns, and other constructs than can use patterns (such as switch and maybe even catch.) Together with the related features of records and sealed classes, pattern matching holds the potential to simplify and streamline much of the code we write today.

Scoping of binding variables

A pattern embodies a test, a conditional extraction of state from the target if the test succeeds, and a way to declare binding variables to receive the results of the extraction. We have see one kind of pattern so far: type patterns. These are denoted as T t, where the applicability test is instanceof T, there is a single element of state to be extracted (casting the target reference to T), and t is the name of a fresh variable to receive the result of the cast. Currently patterns can only be used on the right-hand side of instanceof.

The binding variables of a pattern are "ordinary" local variables, but they have two novel aspects: the location of their declaration, and their scoping. We are used to local variables being declared "at the left margin" via top-level statements (Foo f = new Foo()) or in the headers of statements such as for loops and try-with-resources blocks. Patterns declare local variables "in the middle" of a statement or expression, which may take a little time to get used to; in:

if (x instanceof Integer i) { ... }

the occurrence of i on the right-hand side of the instanceof is actually the declaration of the local variable i.

The other novel aspect of binding variables is their scoping. The scope of an "ordinary" local variable runs from its declaration until the end of the statement or block in which it is declared. Locals are further subject to definite assignment, a flow-based analysis that prevents us from reading it when we cannot prove it has already been assigned to. Binding variables are also subject to definite assignment, but take it one step further: the scope of a binding variable is the set of places in the program where it would be definitely assigned. This is called flow scoping.

We've already seen a simple example of flow scoping; in the declaration of the equals method of Point, where we say:

return (o instanceof Point p)
    && x == p.x && y == p.y;

The binding variable p is declared in the instanceof expression, and because && is short-circuiting, we can only get to x == p.x if the instanceof expression is true, so p is definitely assigned in the expression x == p.x, and therefore p is in scope at this point. But, if we had replaced the && with ||, we would get an error saying p is not in scope, because it is possible to get to the second clause of a || expression without the first clause being true, and therefore p would not be definitely assigned at that point. (The definite assignment rules in the specification are written in a somewhat abstruse style, but they do comport with our intuition about what happens before what.)

Similarly, if a pattern match appears in the header of an if statement, the bindings will be in scope in one or the other arm of the if, but not both:

if (x instanceof Foo f) {
    // f in scope here
}
else {
    // f not in scope here
}

and similarly:

if (!(x instanceof Foo f)) {
    // f not in scope here
}
else {
    // f in scope here
}

Because scoping of binding variables is tied to control flow, refactorings such as inverting the if condition or applying De Morgan's laws will transform the scoping in exactly the same manner as they transform the control flow.

One might wonder why this more complex scoping approach was chosen when we could have continued with the old "scope runs to the end of the block" rules that we've always had for locals. And the answer is: we could have, but we probably wouldn't have liked the result. Java prohibits shadowing of locals by locals; if the scope of a binding variable ran until the end of the containing block, then chains like:

if (x instanceof Integer num) { ... }
else if (x instanceof Long num) { ... }
else if (x instanceof Double num) { ... }

would be engaging in illegal redeclaration of num, and we'd have to make up a fresh name for each occurrence. (The same would be true when we get to patterns in case labels in switch.) By making num not even be in scope by the time we get to the else, we are free to redeclare a new num (with a new type) in the else clauses.

Flow scoping also allows us to get the best of both worlds with respect to whether the scope of a binding variable escapes its declaring statement. In the if-else examples above, the binding variable was in scope in one arm or the other of the if-else, but not both, and not in the statements following the if-else. But, if one arm or another of the if always completes abruptly (such as returning or throwing an exception), we can use this to extend the scope of the binding -- which turns out to usually be what we want.

Suppose we have code like:

if (x instanceof Foo f) {
    useFoo(f);
}
else
    throw new NotFooException();

This code is fine, but fussy; many developers would prefer to refactor as:

if (!(x instanceof Foo f))
    throw new NotFooException();

useFoo(f);

The two do the same thing, but the latter reduces the cognitive load on the reader in several ways. The "happy" code path stands out clearly; by having it at the top level, rather than subordinate to an if (or worse, a deeply nested set if ifs), it is front-and-center in our perception. Further, by checking the preconditions on entry to the method and throwing if the preconditions fail, readers do not have to keep the "but what if it is not a Foo" scenarios in their head; the precondition failures have been dealt with ahead of time.

In the latter example, f is in scope for the remainder of the method because it would be definitely assigned there: there is no way to get to the useFoo() call without x being a Foo and therefore binding f to the result of casting x to Foo -- because the body of the if always throws. The definite assignment analysis takes abrupt completion into account. Without flow scoping, we would have had to write:

if (!(x instanceof Foo f))
    throw new NotFooException();
else {
    useFoo(f);
}

Not only would some developers find it irritating that the happy path is relegated to an else block, but as the number of preconditions increases (especially if one is dependent on another), the structure gets progressively more complicated and the happy-path code gets shunted farther and farther to the right.

One other new consideration is the interplay of pattern matching with generics. In our EnumMap example, it might appear that we were testing not only for the class of the target, but for its type parameters too:

public EnumMap(Map<K, ? extends V> m) {
    if (m instanceof EnumMap<K, ? extends V> em) {
        // optimized copy of map state from em
    }
    ...

But, we know that generics in Java are erased, and we're not allowed to ask questions the runtime can't answer. So what's going on here? The type test here has a static and a dynamic component. The compiler knows that EnumMap<K, V> is a subtype of Map<K, V> (from the declaration of EnumMap: class EnumMap<K, V> implements Map<K, V>), so if a Map<K, V> is an EnumMap (the dynamic test), it must be an EnumMap<K, V>. The compiler checks the proposed type parameters in the type pattern for consistency with what is known about the target; if casting the target to the type being tested would have resulted in an unchecked conversion, the pattern is not allowed. So we can use type parameters in instanceof, but only to the degree they can be statically validated for consistency.

Where is this going?

So far, this pattern matching feature is extremely limited; there is one kind of pattern (type patterns) and one context (instanceof) where patterns can be used. Even with this limited support, we already get a significant benefit: redundant casts goes away, which eliminates redundant code and brings the more important code into sharper focus, and at the same time eliminating a place for bugs to hide. But this is just the start of the good things that pattern matching will bring to Java.

The obvious next context to which we might add pattern matching is the switch statement, which is currently limited to a narrow set of types (numbers, strings and enums) and a narrow set of conditions we can express on those types (constant comparison). Allowing patterns, rather than just constants, in case labels dramatically increases the expressive power of switch; we could then switch over all types and express much more interesting multi-way conditionals than mere comparison to a set of constants.

A bigger reason for adding pattern matching to Java is that it provides us with a more principled means of breaking down aggregates into their state components. Java's objects give us abstraction via aggregation and encapsulation. Aggregation allows us to abstract data from the specific to the general, and encapsulation helps us ensure the integrity of aggregate data. But, we often pay a high price for this integrity; we often do want to allow consumers to query the state of objects, so we provide APIs to do so in a controlled way (such as accessor methods.) But these APIs for state access are often ad-hoc, and the code for creating an object looks nothing like the code for taking it apart (we construct a Point from its (x,y) state with new Point(x, y), but recover the state by calling getX() and getY().) Pattern matching addresses a long-standing gap in the object model by bringing destructuring -- the dual of construction -- into the object model.

A prime example of this is deconstruction patterns on records. Records are a concise form of transparent data-carrier classes; this transparency means that their construction is reversible. Just as records automatically acquire a host of members (constructors, accessors, Object methods), they can also automatically acquire deconstruction patterns, which we can think of as "constructors in reverse" -- a constructor takes state and aggregates it into an object, and a deconstruction pattern takes that object and deconstructs it back into state. If we construct a Shape with:

Shape s = new Circle(new Point(3, 4), 5);

we can deconstruct the resulting shape with:

if (s instanceof Circle(Point center, int radius)) {
    // center and radius in scope here
}

The pattern Circle(Point center, int radius) is a deconstruction pattern. It asks whether the target is a Circle, and if so, casts it to Circle and extracts the center and radius components (in the case of a record, it does this by calling the corresponding accessor methods.)

Deconstruction patterns also offer an opportunity for composition; the Point component of the Circle is itself an aggregate which can be deconstructed, which we can express using a nested pattern as follows:

if (s instanceof Circle(Point(int x, int y), int radius) {
    // x, y, and radius all in scope here
}

Here, after we extract the center component of the Circle, we further match the result to the Point(var x, var y) pattern. There are several important symmetries here. First, the syntactic expression of construction and deconstruction are now structurally similar -- we can use similar idioms to build things up and take them apart. (In the case of records, both can be derived from the state description of the record.) Previously this was a significant asymmetry; we built things with constructors, but took them apart with ad-hoc API calls (such as getters) that looked nothing like the idioms for aggregation. This asymmetry imposes cognitive load on developers, and also provides bugs with a place to hide. Secondly, construction and deconstruction now compose in the same way -- we can nest the Point constructor call in the Circle constructor call, and we can hest the Point deconstruction pattern in the Circle deconstruction pattern.

Records, sealed types, and deconstruction patterns work together in a pleasing way. Suppose we have this set of declarations for an expression tree:

sealed interface Node {
    record ConstNode(int i) implements Node { }
    record NegNode(Node n) implements Node { }    
    record AddNode(Node left, Node right) implements Node { }
    record MultNode(Node left, Node right) implements Node { }
}

We can write an evaluator for this with a pattern switch as follows:

int eval(Node n) {
    return switch (n) {
        case ConstNode(int i) -> i;
        case NegNode(var node) -> -eval(node);
        case AddNode(var left, var right) -> eval(left) + eval(right);
        case MulNode(var left, var right) -> eval(left) * eval(right);
        // no default needed, Node is sealed and we covered all the cases
    };
}

Using switch in this way is more concise and less error-prone than the corresponding chain of if-else tests, and the switch expression knows that if we have covered all of the permitted subtypes of a sealed class, then it is total and no catch-all default is needed.

Records and sealed classes together are sometimes referred to as algebraic data types; adding in pattern matching on records and switching on patterns allows us to abstract over algebraic data types safely and simply. (It is no accident that languages with built-in tuple and sum types also tend to have built-in pattern matching.)

Pattern matching throughout history

Pattern matching may be new to Java, but it is not new; it has a long history across many language (arguably going back as far as the text-processing language SNOBOL in the 1960s.) Many developers today associate pattern matching with functional languages, though this is largely an accident of history. Pattern matching is indeed a good fit for statically typed functional languages, which tend to have built-in structural types for tuples and sequences, as pattern matching is an ideal tool for taking apart these sorts of aggregates. But pattern matching makes just as much sense in object-oriented languages as in functional ones. Scala and F# were the first to experiment with pattern matching in object-functional hybrids; Java will (eventually) bring pattern matching more deeply into the object model.

The roadmap for pattern matching extends even farther than described here -- allowing ordinary classes to declare deconstruction patterns alongside constructors, and static patterns (e.g., case Optional.of(var contents)) alongside static factories. Together, we expect this will usher in an age of more "symmetric" API design, where we can take things apart as easily as and regularly as we put them together (but only when we want this, of course).

Roads not taken

It is a long-standing request for the compiler to be able to infer refined types based on past conditionals (often called flow typing.) For example, if we have an if statement conditioned on x instanceof Foo, the compiler could infer that, inside the body of the if, that the type can be refined to the intersection type X&Foo (where X is the static type of x.) This also would eliminate the cast that we have to issue today. So, why didn't we do this? The simple answer is: it's a significantly weaker feature. Flow typing solves this particular problem, but offers dramatically lower payback, in that pretty much all it does is get rid of the casts after instanceof -- it offers no story for richer switch, or destructuring, or enabling better APIs. (As language features go, it is a more of a "band aid" than a real enabler.)

Similarly, another long-standing request is "type switch", where you could switch over the type of the target, not just constant values. Again, this offers a tangible benefit -- turning some if-else chains into switches -- but again, has far less runway to improve the language overall. Pattern matching gives us these benefits -- but also far more.

Summary

Pattern matching is a rich feature arc that will play out over several versions. The first installment allows us to use type patterns in instanceof, which reduces the ceremony of such code, but future installments will bring us patterns in switch, deconstruction patterns on records, and more, with the aim of making destructuring objects as easy as (and more structurally similar to) constructing them.

About the Author

Brian Goetz is the Java Language Architect at Oracle and was the specification lead for JSR-335 (Lambda Expressions for the Java Programming Language.) He is the author of the best-selling Java Concurrency in Practice and has been fascinated by programming since Jimmy Carter was President.

InfoQ Software Architects' Newsletter

Java Feature Spotlight: Pattern Matching

Write for InfoQ

Key Takeaways

Scoping of binding variables

Where is this going?

Roads not taken

Summary

About the Author

Rate this Article

This content is in the Java topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter