BT

A Discussion With Neal Gafter on the Future of Java

Posted by Charles Humble on Sep 14, 2011 |

At the end of May I attended the inaugural What's Next Conference which took place at 'Le Grand Rex', in the centre of Paris. One of the two keynote speakers was Neal Gafter, who was a primary designer and implementer of the Java SE 4 and 5 language enhancements, and now works for Microsoft on the .NET platform languages. I was fortunate enough to be able to interview him for InfoQ, and the following is a transcript of our conversation. During his presentation, Neal expressed the view that Oracle's acquisition of Sun Microsystems was, on balance, a good thing for Java, remarking "innovation works best when someone is in charge", so I began by asking him his views on Oracle's handling of the Java community. 

Neal Gafter: Well, I think that the community is the least positive aspect of the Oracle acquisition.

I think there are a lot of things they're still working on. I mean, I can tell you lots of things that they say they're working on but haven't gotten to. They're trying to develop specifications in the open. They promised that -- what they basically said is from now on all JSRs will be operated in the open. All of the expert group mailing lists will be publicly readable.

And then they launched JSRs -- Project Coin and Project Lambda. And they're ongoing. And then they have public reviews. And none of the expert group discussion is readable.

I said [to them] I would be happy to give you detailed feedback on these as soon as I have the expert group discussion as a reference, because the rationale for a lot of the decisions that were made will be revealed and it will be understandable in the context of having the expert group discussions as a reference.

And the response was, "We're working on that, we'll get that to you". It's not that the expert group says, "We don't want this public". The expert group is happy to have the discussion public. And Oracle are saying they want it to be public, but they haven't done it.

So, the public review period closed. And there's still no access. They're working on a JSR for a revision of the Java language specification, and I went to review that. That's not even an expert group; it's just Alex Buckley and the Oracle language team fixing specification bugs by revising the specification.

So they [Oracle] said, "Here's a list of the bugs that we believe we've fixed in this version of specification, and here's the revised specification". And I went down that list and I went to the website to review the bugs -- and they're not publicly visible. So I don't even know what are the bugs that they think they fixed. How do I know?

I don't have enough information. And they said, "Oh, that's a mistake, they should be publicly available; we'll work on it". Then the public review period closes. And then a week or two later, they said, "Oh, the bugs are viewable now". But it's too late, right, to get any public feedback during the public review period.

Now, of course, they're happy to have feedback whether it's inside or outside the public review period. The public review period is a formal period of time. They'll be happy to take feedback at any time it's available. So, it's a small point in the particular case of the language specification. But still, they're not as outward-focused as Sun Microsystems was. They're much more inward-focused. They have their own developers. They don't expect to have [it] worked on by folks outside of Oracle as much as Sun Microsystems hoped to have that involvement.

Now as a practical matter, most of the work takes place inside the company anyway -- it always did and probably always will. But Sun very much did try to engage the community and involve them in the process, and that's not so much the case at Oracle. And I think they sort of have two left feet about it. They're just very clumsy. And I think they're learning.

 InfoQ: Do you think they're trying to fix that?

Neal Gafter: I think they're trying. I don't think it's maliciousness. I think it's never ascribe to malice that which can be explained by incompetence. I just think it's a community that's very different from the community they're used to dealing with.

And they're learning how to engage. It's off to a rocky start, but I think they really want to make it work. They really want to be good collaborators with the community and they're just trying to figure out how to do it. It is not so much technical problems as administrative problems inside Oracle.

If they want to announce something or they want to set up a public mailing list, well, they don't have the administrative mechanism for doing that. And the company is large enough that they can't just have people doing it themselves. They have to have a process by which such things are approved and enacted. And that process is not used to working with the kind of things they're trying to do right now.

So I think they'll work it out. I think they'll get better over time.

It's certainly uncomfortable where we are now, but I think they want to be good collaborators with the community. They want to build a community. They want that community to be there because they benefit from the community more than anyone else does.

It's in their best interest. So I wish it was better, and I give them a hard time frequently about it publicly, but I know they're trying. And I will continue to give them a hard time, even as they make improvements, because there's always room for improvement.

But I think they're going in the right direction.

 InfoQ: I'm sure you've had this question a lot of times, but it's interesting to me now that you are working for Microsoft: Obviously, the relationship between Microsoft and Sun was pretty terrible. It got better for various reasons, but to have someone working on C# and also quite actively involved in how Java shapes [up] is interesting politically if nothing else. So how does that work?

Neal Gafter: Well, it's not part of my job, really. I mean, Java has nothing to do with what I do for Microsoft. I do it because I care. I do it because I'm interested. I do it because I have a lot of friends that are involved in it. You know, it was a big part of my life for years and I care about where it goes. And there's a lot to learn from the way Microsoft does things that are valuable lessons for Java.

A lot of Java's growing pains are growing pains that C# has already gone through. A lot of the things that Java is trying to do today are things that C# wanted to do, and then did, and now does quite well. But not everything is perfect even in the C# world. I mean, there are things that we've learned. There are valuable lessons that Java could benefit from.

So there are things that I learned that I can take back to the Java world and say, for example, have you looked at what C# is doing with the asynchronous language feature? Have you looked at the way Lambda expression is working in C#?

You know, the way the concurrency mode operations work in C and LINQ, for example. I think those are all very useful lessons for the Java world.

 InfoQ: You had some advantages, I guess, in C# in that when you were adding some of the big features, the audience for C# was a bit smaller so it was easier to break backwards compatibility and those kind things. Or, is that more of a philosophical difference?

Neal Gafter: Well, I think there's a bit of philosophical difference. I'm guessing what you're talking about is Generics more than anything else.

InfoQ: Yes.

Neal Gafter: So just the background for your readers: When Java added Generics, they did so using Erasure.

In other words, it did not require deep virtual machine changes; it's all at compile time. At runtime all of the information, or most of the information, about Generics is erased from the actual objects. You can't distinguish at runtime an empty list of strings from an empty list of integers. They're represented exactly the same way at the runtime.

One of the reasons for doing that was that we wanted to be able to generify existing libraries -- the client of the libraries and implementers of the libraries independently. We didn't want to have to require a particular order. We didn't want to have to require vendors of libraries to both have a pre-Generics version and a post-Generic version.

And then if you build on the library, then you have to have the versions that build on each of those, right? In theory, it could get very complicated. I think in practice it would not have been as complicated for the library vendors as we feared it might have been.

When C# added Generics, they did not deprecate the old collection classes. Those collection classes are still part of the libraries. In fact, there's an inheritance relationship between the non-Generic collection interfaces and the Generic collection interfaces. The Generic ones, I believe, extend the non-Generic ones.

 InfoQ: Okay.

Neal Gafter: So that actually, when you're implementing the generic ones, you're implementing the non-generic ones as well. That gives you a certain level of interoperability. That means if you create a generic collection, you can pass it to something that expects the old-style collections.

You can't go the other way around. If you have an old-style collection, you can't pass it to something that expects a new-style collection. But it's not that hard to adapt it. It's not that hard to just create a new collection and put all the elements into it.

I think it wasn't explicitly a goal to make the engineering easier for JSR-14. It wasn't explicitly one of the goals to say, "We want to minimize the amount of work to add Generics". But the amount of work to add Generics the way Java did is much, much less work than the way it was done in the Microsoft platform because the Microsoft platform has not only deep changes in the compilers, but deep changes in the virtual machines and the runtime libraries. I mean, it's throughout the system. It's a more radical and extensive change to the system.

Technically, it doesn't break any existing libraries but, if you want to migrate your library to use Generics, you have to go through a step where you actually make the effort to migrate them. I don't think that it would have been a great burden if Sun had taken the same approach that C# had taken -- to the users. It would have been [a great effort] to Sun.

Sun would have a lot more work to do, and they actually didn't have the resources to do that.

 InfoQ: Right.

Neal Gafter: At least not in that time frame. It would have taken a couple of more years or, you know, another half dozen people in order to actually make it happen the way Microsoft did it, because Microsoft has always had more engineering resources on the .NET platform than the equivalent resources in Sun Microsystems. Now, Sun Microsystems per capita probably does get some more engineering work done than Microsoft in terms of the feature points.

I think Microsoft has a much more thoroughly designed, tested, well-integrated system typically -- it's definitely more carefully reviewed, and all of the pieces of the system work together much more cleanly than is the case with the resources that Sun has been able to devote. For example, in 1.1 Sun Microsystems simultaneously added inner classes and serialization to the platform. I wasn't actually working on those. But from what I understand, those teams really treated them as independent or orthogonal features, but they're not.

They're not orthogonal features. So the interaction between them has never been very comfortable. I mean, there really are weak points at the edges between those features. And if you have more time, more resources, much more testing and review, it would have taken longer or it would have been more expensive to develop, but I think you might end up with something that's better. My experience with C# is that it's a much more solid design than Java in a lot of ways and the Generics is one example of that.

So does that answer your question?

 InfoQ: Yes. There's a debate that's been running in Java for a long time and I guess it also runs in .NET, which is essentially whether you should keep adding features. Is there a point where your language becomes sufficiently complex as a consequence of adding new features and you should just stop? The example that everyone throws up is C, which really hasn't evolved a great deal for some time and is still a widely used language, possibly because of that. And the contrast between C# particularly which has obviously been though a number of very rapid changes. Where do you stand on that? What's your view on that as an argument?

Neal Gafter: I think change is necessary but it has to be managed very carefully. It's more difficult in Java than it is in C#, for a number of reasons.

Number one: resource constraints.

Number two: there has not been for Java the kind of long-term planning for the language as there has for C#.

C# has a very clear -- I mean, there's Anders Hejlsberg. He's the architect for the language and the platform. He has a very strong design sense. He has a very light touch, actually, in terms of the way he collaborates with the people who work with him. But there is a long-term view of where the language should go and how. And every change that's considered is considered first with respect to whether it takes the language in that direction or whether it's just another lump on the side because someone would like it.

There are a lot of things that you could do that someone will be happy with, but that most people will not benefit from. And when you add something that someone doesn't benefit from, it's actually a negative for them. Even though they don't have to use it, or look at it, or care about it, it makes the system more complicated for them.

So, one of the ways we think about that inside Microsoft is that every language proposal starts out at minus 1,000 points. And you have to fight past that minus 1,000 points before you're even in positive territory and worth being considered to be really added to the language. At one point, I understand it was minus 100 points, and now it's right at 1,000. So, the bar moves so that it's harder. And it should be harder to add to a language.

And even the C programming language has changed. There's a standards committee. New versions of the standard come out. And there are real changes, real substantial changes to the language over time. The C committee is much more conservative about those changes than, for example, the C++ standards committee has been.

The C++ standards committee has been much more generous in terms of what they would consider putting in the language and what they would reject. You know, I think that there's certainly cause for conservatism, but I think it's unlikely that any language that we use today could remain vital without being open to some level of changes.

Also for Java, one of the reasons that change has been more difficult is that a lot of the changes that went in were generally speaking good ideas, but they were added in a way that does not consider the future evolution of the language. It sort of goes back to my point of long-term view.

Generics is probably a very good example, where there was concern about migration of libraries as people transitioned to the version in which Generics were added. Erasure was used to make that particular transition easier. The problem is, Erasure makes it more difficult to add other language features in the future. For example, adding function types to the programming language is much more difficult with Erasure as part of Generics.

InfoQ: You argued at one point, if I remember rightly, on a blog post that you could still fix that. You could do a version of Generics that didn't rely on type Erasure in Java. Have I got that right? I think I remember reading that.

Neal Gafter: Right. So I proposed you can actually have both. But you would end up having non-Generics and you'd have erased generic type parameters and you'd have reified ones.But you would never get rid of Erasure. There would be some type parameters that are always erased and there would be some type parameters that are never erased.

The problem is that, for the ways in which Erasure interferes with the future evolution of the language, that doesn't help you; it's still there. And the way in which it interfered with languages before, it still interferes with them. For people who specifically want to be able to use Generics and have them reified because there are specific things that they want to be able to do, it will help them. But you're still handicapped by Erasure for a lot of things. And I know that there are people that are looking at [the question of] what's the minimum breakage we could get to reify what we have today as Generics. Could we do that without breaking too much?

It's a very difficult problem, but there are some people who have confidence that they might be able to make some progress on that. If they succeed, there would be a release where some kinds of Generic code that's arguably technically broken but has always worked in the past would no longer work. But then you would have reified Generics in the future. If that's possible, I think it would be wonderful, but I'm a little skeptical.

 InfoQ: Yes. I'm struggling to see it myself, but interesting. Stephen Colebourne amongst others argued that Oracle should produce a non-backwards compatible version of Java, effectively. So, go and fix the things we got wrong and possibly maintain the two versions. So...

Neal Gafter: Okay. Microsoft has already done that. It's called C#. Okay. I mean, I would ask the question, why should it be Oracle?

Why should it be called the Java? You know, what does it have in common with Java? It's not the Java programming language. If it's not backwards compatible, it's not the Java programming language. If you want it to run on the Java VM and you want reified Generics, you have a problem because reified Generics require VM support.

So if it's a new VM and a new programming language, what does Oracle have to do with it?

You know, that would be my question. I certainly think there's room for languages other than Java. If I didn't believe that, I wouldn't be working for Microsoft right now. But even on the Java Virtual Machine, there are some excellent alternatives to Java like Scala, for example.

You know, if I were working on the VM and had my choice of language to use, Scala would be very high on my list of preferences.

 InfoQ: What trends do you see emerging in languages that we should be paying attention to? You mentioned Scala as an OO functional hybrid thing, and you have a similar thing I guess with F# in .NET. What do you think is the next thing beyond functional?

Neal Gafter: Well, beyond functional? I think there's a lot of legs on that. I'm sort of ducking the question by answering that way.

But I really do think that -- especially for Java and C# and languages that started out in the more imperative style -- there's a lot of mileage left about explaining ideas from the functional programming community. And I think that's going to be years before that's all tapped out.

And I don't think Java or C# can go as far as you might wish to. I don't think they'll ever appear to be functional programming languages. I mean, they have a certain style that is naturally imperative. And there's a lot you can gain from using functional idioms, especially from programming with immutable data. I think in order to really take advantage of concurrency and do so with good old locks and signals -- you know, the sort of stuff that was in the very earliest versions of Java -- it doesn't scale as a practical matter.

You can't build large systems with concurrency that way. Well, maybe you can, but I can't. And [for] the bulk data operations and fork join concurrency programming with immutable -- the problem is shared mutable state.

And the solution is you don't share or don't mutate.

And the functional style says you don't mutate. You don't mutate your state. You work with immutable data. And there are things to be done in the Java language to support that style. There are things to be done in the libraries, a lot to be done in the supporting libraries. We are a long way from those styles of programming being widely used.

 InfoQ: So do you think that our approach to programming effectively needs to change to exploit multi-core and the like?

Neal Gafter: Yes. And I don't think it can happen as a revolution.

I think as a practical matter it happens. You know, we find that we have performance problems or reliability problems and we replace subsystem by subsystem using styles that are more scalable.

InfoQ: Do any of the current approaches like the Actor model or the message passing thing that Erlang and the like do, seem like a particularly good fit to you for Java or for C#?

Neal Gafter: I think they could be, but I think it requires that the language be taken in directions that nobody currently has plans to. One of the things that makes the Actor style worthwhile is pattern-matching, which Scala gets with case classes.

No one's considering doing something like that for Java today. I think it would be worth considering for Java, but I think there needs to be a “big picture” plan for the long term and I can easily see that that would be one component of it for Java.

But you're never going to get there without a plan, without a long-term plan. I don't think it's something whereby someone could say, "Well, in version x we'll add Actors". There needs to be a long-term plan. It's awkward right now to build classes representing immutable data in Java. That would have to be easier. It would have to be easier to do pattern matching. I don't think serialization in its current form is the ideal way of passing immutable data around but maybe it could be, right? If it's applied to immutable data types that don't exist today, maybe it could be the right mechanism for that. But for Java in the form it is today, not yet. It doesn't seem to be a good fit.

I think it would certainly be valuable to be considered as part of the future. But I haven't heard anyone in Oracle talking about that for Java. So certainly not for Java 7 or 8. If something like that were to happen, it would be beyond that time frame.

But I think if your desire is to program in that style -- and there are a lot of benefits to programming in that style -- then you're (especially today, anyway) much better off choosing a language in which that's natural. There are just too many changes that would have to happen to a language like Java or C# before that style would be easily expressed in the language. So, use Scala for that style of programming, or F# on the .NET platform.

 InfoQ: I guess the other big trend that's common to the .NET platform and the Java platform is support for languages other than Java and C#. With both, it was kind of there but Microsoft emphasized it more in the beginning.

Neal Gafter: Of course, you can see that with Java. The name java.lang.Object, right?

InfoQ: Right, yes.

Neal Gafter: It's supposed to be object for all languages but that's not what was -- no one had that in mind at the very beginning.

InfoQ: Yeah. So John Rose has been leading that project for Java. Of the other things that have been looked at as part of that Da Vinci Machine project -- things like tail recursion -- which ones would you do next if you could do the next one? What would it be?

Neal Gafter: There are a couple of different things that you might be trying to accomplish in that whole direction, support for dynamic languages. One of them is things in the VM to make it easier to implement a dynamic language, and that's mostly what the Da Vinci project has been focusing on. You know, what can we put in to support individual programming languages? But another thing would be to make it easier for languages to interoperate with each other on the platform. And that was one of the focuses of Microsoft with the DLR and the addition of dynamic as a type to support language interoperability in the C# programming language.

The idea is that what Microsoft has done, it has made it so that these languages not only can execute efficiently, they have support for efficient dynamic dispatch, but there's also a meta-object protocol so that objects from one dynamic language can have some reasonable semantics when used from another programming language, and so that you can interoperate between languages with some reasonable semantics.

So I think that adding a meta-object protocol, either at the VM or a veneer above the VM possibly, would be valuable for interoperation between dynamic languages.

 InfoQ: It's the latter you did in .NET, isn't it? It's a library above the CLR.

Neal Gafter: It is. Well, actually there is some support in the CLR, but mostly it's a library above the CLR, yes.

And then the second thing would be support in the Java programming language. If it was done in the style of C#, it would be basically a type called "dynamic" that basically means, "Figure out the semantics at runtime. Let me do anything I want at compile time, but figure out the semantics at runtime". That would allow you to interoperate with objects that were created by dynamic programming languages.

And so that I think is a very useful direction, to have a dynamic meta-object protocol.

There are a lot of things I'd love to see in the VM to support dynamic language, and yes tail recursion is one of them. It's not a big one, but for some programming languages it makes a big difference.

Also segmented stacks. Right now, this is a problem both for Java and C#. These platforms remove a lot of the reliability issues with native languages like C and C++. You don't have index out of bounds errors occurring because your indices are checked and you get an exception. You can't write over memory that's been freed, or write into unallocated memory and so on, because you can't manage the memory yourself; the garbage collector does it for you. But one of the things that these platforms don't do very well right now is deal with stack overflow. And stack overflow doesn't always occur because of errors; it doesn't always occur because you have some infinite recursion that you didn't intend to have.

Stack overflow sometimes occurs because you have something that is naturally recursive. I work with compilers and it's easy to crash the Java compiler just by writing, you know, int i = 1+1+1+1 and just do that a few thousand times. And the semantic analyzer will be trying to analyze that, or the parser will be trying to parse it and it will just blow the stack. And what happens is, the process crashes.

You know, there's no good recovery from that. And you can fix it by [saying], "Well, we start over again but I'll just allocate more stack". The problem is, you can't necessarily know ahead of time how much stack to allocate to any given thread.

And the advantage of segmented stacks like Google's Go system is you don't have to decide ahead of time. It's dynamically adjusted based on the needs of the program. And it makes some kinds of software much easier to write.

Threads are expensive in Java, and the reason you pool them at all, is because they're expensive.  If they weren't expensive, you wouldn't pool them. And the reason they're expensive is because the stack is pre-allocated. There's huge stack, there are a lot of resources. You don't want run out of it, right, so you allocate this big stack thing. If you have segmented stacks, the threads themselves will be much cheaper.

And with threads cheaper, there are a lot of things that become much easier. You know, asynchronous -- a lot of what people think of as asynchronous programming and state machine writing and co-routines. If threads were sufficiently cheap, you wouldn't need to do any of those things. You just create a thread and let the thread wait because it's not consuming any resources when it's waiting.

It doesn't even have stack allocated in your virtual address space. So I think it will be valuable to consider to make something like that for Java.

That would have a subtle, but I believe deep, impact on the platform that would be very beneficial. And it would have an impact on the sorts of things that you consider that you need or don't need as you continue to evolve the platform. So that would be one that I would vote for -- segmented stacks. Instead of, for example, co-routines at the VM level.

 InfoQ: Thank you very much.

About the Interviewee

Neal GafterNeal Gafter was a primary designer and implementer of the Java SE 4 and 5 language enhancements, and his Java Closures implementation won an OpenJDK Innovator’s Challenge award. He continues to kibitz on the language changes in progress for SE 7 and 8. Neal previously worked on Google’s online Calendar. He was a member of the C++ Standards Committee and led the development of C and C++ compilers at Sun Microsystems, Microtec Research, and Texas Instruments. Today, Neal works for Microsoft on .NET platform languages. Neal is coauthor of "Java Puzzlers: Traps, Pitfalls, and Corner Cases" (Addison Wesley, 2005). He holds a Ph.D. in computer science from the University of Rochester.

Hello stranger!

You need to Register an InfoQ account or to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Again generic/erasure! by arnaud m

Generics in java are really useful, even with erasure.
I don't understand why people always say it's a major java problem.

For instance, I prefer that Oracle engineers spend time on implementing method references than "runtime" generics.

Re: Again generic/erasure! by Zhao Jeffrey

It's true that generics with type erasure is useful, comparing to nothing.

But, it's really painful when you need the type information for programming, especially you've learnt C#. You'll have a lot of useful programming patterns and meta-programming abilities from runtime generic support in .NET.

Re: Again generic/erasure! by arnaud m

Can you give examples or links about how C# developers use this runtime generic support?

Re: Again generic/erasure! by roberto dell'oglio

Try to do in java what DI containers do in C# and you will undertstand the real value of C# generics.

Re: Again generic/erasure! by Zhao Jeffrey

One of the most simple usages:


T[] CreateArray<T>(int size) where T : new()
{
var array = new T[size];
for (int i = 0; i < array.Length; i++)
{
array = new T();
}
}

Re: Again generic/erasure! by Zhao Jeffrey

Buggy comments system. The code in the for loop should be: array = new T();

Re: Again generic/erasure! by Zhao Jeffrey

WTF

array[ i] = new T();

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

7 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT