BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Brian Goetz Speaks to InfoQ on Data Classes for Java

Brian Goetz Speaks to InfoQ on Data Classes for Java

Leia em Português

This item in japanese

Bookmarks

On his continuing quest for productivity and performance in the Java programming language, Brian Goetz, Java Language Architect at Oracle, introduced an experimental concept of data classes that has potential to someday be integrated into the language. His research demonstrates a natural fit of data classes with up-and-coming features such as value types and pattern matching. But there is much work to be done before this concept is ready to become part of the Java language. Goetz explored the problems and tradeoffs of data classes on the premise that sometimes "data is just data."

Motivation

Java classes typically require lots of boilerplate code regardless of how simple or how complex those classes may be. This has lead to Java's reputation of being "too verbose." Goetz explains:

To write a simple data carrier class responsibly, we have to write a lot of low-value, repetitive code: constructors, accessors, equals(), hashCode(), toString(), etc. And developers are sometimes tempted to cut corners such as omitting these important methods, leading to surprising behavior or poor debuggability, or pressing an alternate but not entirely appropriate class into service because it has the "right shape" and they don't want to define yet another class.

IDEs will help you write most of this code, but writing code is only a small part of the problem. IDEs don't do anything to help the reader distill the design intent of "I'm a plain data carrier for x, y, and z" from the dozens of lines of boilerplate code. And repetitive code is a good place for bugs to hide; if we can, it is best to eliminate their hiding spots outright.

Similar to class declarations defined in Scala (case), Kotlin (data) and C# (record) that are designed to be compact, the same could potentially be true for a Java class to be a plain data carrier with a minimum of overhead. Without a formal definition of a plain data carrier, most Java developers would most-likely be unable to recognize one. And while the Java community would indeed welcome a data class mechanism in the language, individual interpretations of a plain data carrier could be vastly different. Goetz used the parable of the blind men and an elephant to explain:

Algebraic Annie will say "a data class is just an algebraic product type." Like Scala's case classes, they come paired with pattern matching, and are best served immutable (and for dessert, Annie would order sealed interfaces).

Boilerplate Billy will say "a data class is just an ordinary class with better syntax", and will likely bristle at constraints on mutability, extension, or encapsulation (Billy's brother, JavaBean Jerry, will say "these must be for JavaBeans -- so of course I get getters and setters too." And his sister, POJO Patty, remarks that she is drowning in enterprise POJOs, and reminds us that she'd like these to be proxyable by frameworks like Hibernate).

Tuple Tommy will say "a data class is just a nominal tuple" -- and may not even be even expecting them to have methods other than the core Object methods -- they're just the simplest of aggregates (he might even expect the names to be erased, so that two data classes of the same "shape" can be freely converted).

Values Victor will say "a data class is really just a more transparent value type."

All of these personae are united in favor of "data classes" -- but have different ideas of what data classes are, and there may not be any one solution that makes them all happy.

Understanding the Problem

The concept of data classes goes beyond reduction in boilerplate code, which Goetz maintains is "just a symptom of a deeper problem" in which the cost of encapsulation is shared among all Java classes. The object-oriented principles of abstraction and encapsulation allow Java developers to write robust and safe code across various boundaries:

  • Maintenance boundaries
  • Security and trust boundaries
  • Integrity boundaries
  • Versioning boundaries

For classes such as SocketInputStream, these boundaries are essential due to its inherent complexity. But does a class that is a plain data carrier for, say, two integer components (such as the example declared below) really need to be concerned with such boundaries?

    
record Point(int x,int y) { ... }
    

Goetz explains:

Since the cost of establishing and defending these boundaries (how constructor arguments map to state, how to derive the equality contract from state, etc.) is constant across classes, but the benefit is not, the cost may sometimes be out of line with the benefit. This is what Java developers mean by "too much ceremony" -- not that the ceremony has no value, but that they're forced to invoke it even when it does not offer sufficient value.

The encapsulation model that Java provides -- where the representation is entirely decoupled from construction, state access, and equality -- is just more than many classes need. Classes that have a simpler relationship with their boundaries can benefit from a simpler model where we can define a class as a thin wrapper around its state, and derive the relationship between state, construction, equality, and state access from that.

Further, the costs of decoupling representation from API goes beyond the overhead of declaring boilerplate members; encapsulation is, by its nature, information-destroying.

Requirements for Data Classes

Using the Point declaration above, consider its "de-sugared" definition as a plain data carrier:

    
final class Point extends java.lang.DataClass {
    public final int x;
    public final int y;

    public Point(int x,int y) {
        this.x = x;
        this.y = y;
        }

    // destructuring pattern for Point(int x,int y)
    // state-based implementations of equals(), hashCode(), and toString()
    // public read accessors x() and y()
    }
    

To further study the design of plain data carriers, Goetz defined a set of requirements (or constraints) to "safely and mechanically generate the boilerplate for constructors, pattern extractors, accessors, equals(), hashCode(), and toString() -- and more." He writes:

We say a class C is a transparent carrier for a state vector S if:

  • There is a function ctor : S -> C which maps an instance of the state vector to an instance of C (the constructor may reject some state vectors as invalid, such as rational numbers whose denominator is zero).
  • There is a total function dtor : C -> S which maps an instance of C to a state vector S in the domain of ctor.
  • For any instance c of C, ctor(dtor(c)) is equal to c, according to the equals() contract for C.
  • For two state vectors s1 and s2, if each of their components is equal to the corresponding component of the other (according to the component's equals() contract), then either ctor(s1) and ctor(s2) are both undefined, or they are equals under the equals() contract for C.
  • For equivalent instances c and d, invoking the same operation produces equivalent results: c.m() equals d.m(). Moreover, after the operation, c and d should still be equivalent.

These invariants are an attempt to capture our requirements; that the carrier is transparent, and that there is a simple and predictable relationship between the classes representation, its construction, and its destructuring -- that the API is the representation.

Data Classes and Pattern Matching

A plain data carrier has the advantage, as Goetz states, "to freely convert a data class instance back and forth between its aggregate form and exploded state." This would work conveniently well with pattern matching. As demonstrated in his pattern matching paper, Goetz discussed destructuring and improvements in utilizing the switch construct. With this in mind, it could be possible to write the following code:

    
interface Shape { ... }
record Point (int x,int y) { ... }
record Rect(Point p1,Point p2) implements Shape { ... }
record Circle(Point center,int radius) implements Shape { ... }

...

switch(shape) {
    case Rect(Point(var x1,var y1),Point(var x2,var y2)) : ...
    case Circle(Point(var x,var y),int radius): ...
    }
    

Any concrete instance of Shape could easily be destructured within the switch statement. This could also be useful for externalization such as serialization, marshalling to/from JSON and XML, and database mapping.

Refining the Design Space

Goetz discussed that the requirements for being a plain data carrier comes with trade-offs. He explains:

The simplest -- and most draconian -- model for data classes is to say that a data class is a final class with public final fields for each state component, a public constructor and deconstruction pattern whose signature matches that of the state description, and state-based implementations of the core Object methods, and further, that no other members (or explicit implementations of the implicit members) are allowed. This is essentially the strictest interpretation of a nominal tuple.

This starting point is simple and stable -- and nearly everyone will find something to object to about it. So, how much can we relax these constraints without giving up on the semantic benefits we want? Let's look at some directions in which the draconian starting point could be extended, and their interactions.

These directions cover a wide array of design elements and related issues:

  • Interfaces and additional methods
    • Risk violating the "nothing but the state" rule.
  • Overriding implicit members
    • Risk violating the requirements of a plain data carrier.
  • Additional constructors
    • Ensure the object state and state description are equivalent.
  • Additional fields
    • Risk violating "the state, the whole state, and nothing but the state" rule.
  • Extension
    • Issues related to extension between data classes and regular classes.
  • Mutability
    • Question the rationale of allowing data classes to be mutable.
  • Field encapsulation and accessors
    • Ensure that encapsulating fields must be readable.
  • Arrays and defensive copies
    • Defensive copies violate the invariant of destructuring and reconstructing an array to ensure an equal instance.
  • Thread safety
    • Question how mutability in data classes can be thread safe.

Summary

Java had a excellent year in 2017 and there is much excitement about the language this year. However, as Goetz told InfoQ, data classes are still considered a "half-baked" idea that requires more work to fully understand how this concept can someday be a reality.

In summary, Goetz explains:

The key question in designing a facility for "plain data aggregates" in Java is identifying which degrees of freedom we are willing to give up. If we try to model all the degrees of freedom of classes, we just move the complexity around; to gain some benefit, we must accept some constraints. We think that the sensible constraints to accept are disavowing the use of encapsulation for decoupling representation from API, and for mediating read access to state; in turn, this provides significant syntactic and semantic benefits for classes which can accept these constraints.

Vicente Romero, principal member of the technical staff at Oracle, recently posted an "initial public push" on the development of data classes that can be found on the datum branch of the Project Amber repository.

Goetz spoke to InfoQ about his data classes research:

InfoQ: What kind of community response have you received since publishing your paper?

Brian Goetz: The expected response: some highly positive comments about the idea, and a variety of suggestions (mostly mutually inconsistent) for how it could be "improved." Which is to say, people like the idea, but, as expected, many people would want us to move the design center in one direction, or another, to suit their personal preferences. As a highly subjective feature, this was to be expected.

InfoQ: Do you envision a data class mechanism to someday be integrated in the Java programming language? If so, what kind of effort will be necessary to address all the concerns you discussed in your paper?

Goetz: It is going to require "bake time." With language design, your first idea, no matter how carefully thought out, is going to be wrong. As will your second. Many language features require half a dozen iterations or more before you ultimately discover the right place to land. So we'll be experimenting, prototyping, gathering feedback, iterating, and iterating again. Until we feel we've gotten to the right place.

InfoQ: Is it a goal or a non-goal to promote the rebasing of non-Java languages implementation of compact classes (e.g., Scala case classes) on top of data classes?

Goetz: Every language is going to have its own surface syntax. However, data classes connect with other language features, such as pattern matching, and and we hope (as happened with Lambda) that other languages will target the runtime support of these features, and gain interoperability benefits.

InfoQ: As far as you may know, did the architects of Scala, Kotlin, and C# face similar challenges in implementing a more compact class declaration?

Goetz: Indeed so, though both Kotlin and Scala were able to take this on much closer to the beginning of their projects than C# did, so had fewer constraints to navigate. And each settled in a slightly different point in the design space.

InfoQ: What is the single most important take-home message you would like our readers to know about data classes?

Goetz: That data classes are about data, not about syntactic concision. They are about providing a natural means to model pure data in the object model. And not all classes are carriers for pure data, even if they would like the concision benefits that data classes offer.

InfoQ: What's on the horizon for your data classes research?

Goetz: Breaking the features that data classes need into finer-grained features, that might be usable by all classes. For example, even in classes that are clearly not just data carriers, constructors are full of error-prone repetition, which could be replaced by making a higher-level correspondence between constructor parameters and representation. This way, data classes become simpler (just sugar for other language features), and more classes can get the benefit of the feature without trying to shoehorn them into data classes.

Resources

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Why not to just include @Data in Java?

    by Javier Paniza,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It would be simpler just to include @Data annotation from kombo in Java:
    projectlombok.org/features/Data

    If you add a feature to the language that don't fix a big problem you're making a worse language.

  • Forget Java

    by Serge Bureau,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Move to Scala, you already have Data Classes and so much more.

  • Re: Why not to just include @Data in Java?

    by Michael Redlich,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Javier:

    I'm not sure Lombok's @Data annotation is a viable replacement for a proposed data class mechanism. Remember, this isn't about boilerplate reduction. It's about the ability to write a plain data carrier that is simply just that - a data carrier.

    While Lombok is indeed useful, it does generate the boilerplate for you. And Brian Goetz stated this is where bugs find find their way into your code.

    Mike.

  • Re: Forget Java

    by Javier Paniza,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Serge,

    Move to Scala, you already have Data Classes and so much more.


    That is just the reason to not move to Scala. When I moved from C++ to Java (in 1997) Java had less features than C++, it was nice just for that.

  • Re: Why not to just include @Data in Java?

    by Javier Paniza,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Mike,

    write a plain data carrier that is simply just that - a data carrier.


    But this was available in Java 1.0. If you add a feature that allows you to write less code, or to do something completely new, that is ok, but in any other case you're complicating the language.

  • Re: Forget Java

    by Stephen Johnston,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Scala has many interesting ideas, but I think it comes with a lot of baggage as well.
    Powerful yet dangerous features, like scala implicits, are widely abused across many, many projects, and it makes for untenable code.

  • Don't get it

    by Michael McCutcheon,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    How about fixing Java's broken "fake" generics instead so that we have runtime info? That would be a lot more useful than this.

  • Re: Forget Java

    by Serge Bureau,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Scala is a much smaller language than Java ! So, following your reasoning you should move to it. It is however much more flexible and consistent, so you can do much more with much less code. So go take a look.

  • Re: Forget Java

    by Serge Bureau,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    What a pile of misconceptions, too bad for you

  • Re: Why not to just include @Data in Java?

    by Michael Redlich,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    How so? If this was true, why is Brian Goetz experimenting with this idea of data classes?

  • Re: Don't get it

    by Cergey Chaulin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It's being developed in Valhalla project. Yet no one knows when it's ready.

  • Re: Forget Java

    by Richard Clayton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    What do you mean by smaller? It may take more characters to write something in Java, but the syntax is an order of magnitude less complex than Scala.

  • Re: Forget Java

    by Ben Evans,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Simply not true, I'm afraid. Java has 53 keywords and a regular (you might say rigid) syntax. There are no legal Java programs which have ambiguous meaning at a grammatical level. The same is just not true of Scala, due to the fluidity of the syntax (e.g. indirect object & block syntax, implicits, macros, etc, etc, etc). Once your Scala project is above a relatively small size, good luck figuring out what any of it does without an IDE.

  • Re: Forget Java

    by Richard Clayton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I feel like people who say things like "Scala is smaller than Java" probably only use 1/10 of the syntax -- meaning they use Scala to write compact Java code.

  • Why not a first class struct/record/case type?

    by Richard Clayton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Forgive my naivety (I don't know much about language design), but why not implement a first-class language construct? Extending "java.lang.DataClass" with a tradition class declaration seems like more of the boilerplate people already hate in Java.

  • Re: Forget Java

    by Alex Worden,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I looked. Scala is a "Write Only Language" - like Perl.

  • Re: Forget Java

    by Alex Worden,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Verbosity of syntax isn't there for the compiler, it's there for the HUMANS. It makes is readable and understandable by mere mortals.

    I'm so tired of this obsession with languages. The language was perfectly capable of building anything as of Java 1.5. Why not focus on building something useful with it instead fussing with the mechanics of it. People who invest their time arguing about languages and trying to improve things that don't need improvement will never accomplish anything. In fact, quite the opposite, they provide a terrible distraction for 70% of the rest of the development community that don't have the sense to ignore them.

  • Re: Forget Java

    by Serge Bureau,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    No, it is the number of concepts, and you got it the wrong way, Java is order of magnitude more complex than Java. Also ther is so many things that cannot be expressed in Java

  • Re: Forget Java

    by Serge Bureau,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks, your providing a good laugh

  • Re: Forget Java

    by Serge Bureau,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    You are free to stay behind. But languages can improve, I hope you will understand that.

  • Re: Forget Java

    by Richard Clayton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think you are confusing "lines of code" for complexity. Language complexity is about features and the syntax you use to implement things. Scala is a moderate-highly complex language (certainly the most complex of the popular JVM languages). Scala basically has every feature Java has and then adds:

    - import statements that can be included anywhere
    - case classes
    - companion objects
    - implicit parameters
    - operators as methods (that don't require classic invocation syntax [dot operator])
    - monads
    - multiple parameter lists
    - much more complex member access control (e.g. private, protected, etc.)
    - pattern matching as a first-class language feature
    - anonymous and nested functions
    - function currying
    - lazy evaluation
    - tuples
    - lazy evaluation
    - immutability enforcement

    And I'm sure many more language features I haven't mentioned.

    Scala is not a bad language - it simply takes a lot more knowledge and experience to use it correctly. Yes, you can express things more concisely in Scala; but you can also write incomprehensible code by overusing language features.

  • Re: Forget Java

    by Serge Bureau,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    You named many Scala features, but still miss many.
    The main thing is how you can mix those features, Java is very inflexible. it’s generic handling is catastrophic, the book on good Java use is half about generic strange behaviors (fixed in Scala), you can restrict the lower and upper bound of generic types, you have much better control about privacy of your data. You also have more level of privacy available and can even control to which class family it applies, Java is so incredibly missing control.
    Do not forget the pathetic Java collections, or the actor support (Java even with Akka is much weaker)

    You can compose in FP, not in Java (many orders more flexible)

    And I am forgetting many other advantages, Java is to Scala much less than Assembler is to Java

  • Re: Forget Java

    by Richard Clayton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    You just admitted that Scala is crammed full of features (way more than Java). More features = more complexity -- it's really that simple.

  • Amnemic classes

    by Hans Desmet,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I'm worried this wil lead some developers to amnemic classes: data, but no behaviour

  • Re: Forget Java

    by Serge Bureau,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    No I am not, Java has a lot more exceptions than features that you get to know.
    Plus, Scala is much more consistant, so most feature works as you would think, contrary to Java.
    So because of this consistency, it is much simpler. Java wil never reach that level as you cannot build on shaky fundation.
    Actually, the only inconsistency found in Scala are because the Java link. Fortunately Scala is smoothing a lot the inconvenience.
    So, Java is much more complex.

  • It is contraproductive to my current state of coding

    by Ladislav Jech,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Just a joke: I use lombok and my data structures look only like pojo with variables, the rest is generated on the fly by lombok. With Data class I will need to write more code: "extends DataClass", so it is contraproductive for me :-)))))

  • Re: Amnemic classes

    by Ladislav Jech,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    totally agree.

  • Re: It is contraproductive to my current state of coding

    by Michael Redlich,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Ladislav:

    You won't have to explicitly write out "extends java.util.DataClass." It's part of the syntactic sugar similar to java.lang.Object. That's why I wrote, "consider its 'de-sugared' definition." java.util.DataClass doesn't exist yet. It's a placeholder that Brian Goetz used for the example.

    Hope this helps...

    Mike.

  • Re: It is contraproductive to my current state of coding

    by Ladislav Jech,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    aha, I will read again. Thx for clarification.

  • Re: Amnemic classes

    by Ladislav Jech,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    On the other hand Hans, when you have objects to be exchanged between 2 systems (so we serialize on one end and deserialize on other end), we use already non-behavioural classes, these DTOs simply transports data and the is the only function they have.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT