Bio Tim Ellison has worldwide responsibility for Class Library and Open Source Engineering at IBM. He has contributed to the commercial implementation of Smalltalk, IBM VisualAge Micro Edition, Eclipse, and the Java SDK spanning a period of over twenty years. He has a broad knowledge of high performance runtimes, open source methodologies, and development environments.
Software is changing the world; QCon aims to empower software development by facilitating the spread of knowledge and innovation in the enterprise software development community; to achieve this, QCon is organized as a practitioner-driven conference designed for people influencing innovation in their teams: team leads, architects, project managers, engineering directors.
1. My name is Charles Humble from InfoQ and I am here with Tim Ellison, IBM Java Development Team Leader, based at the Hursley Software Lab in the UK. Tim, could you start by telling us a little about yourself and your work at IBM?
I am Tim Ellison, as you said, based here in Hursley, which is the Java Technology Center for IBM in the UK. It’s a worldwide organization so there are teams in all the geographies around the world and I look after the class library team and have a technical role across all of Java SE delivery as well.
2. So we're here to talk about the future of Java. I guess we should probably start by tackling the question of whether we should carrying on adding features to Java at all, or whether we have reached a point where the language has got sufficiently complex, as a consequence of adding more and more features to it, and we should simply stop. The example that everyone throws up is C which has evolved relatively slowly and is still a widely used language, and a very popular language. And we tend to contrast that with C++ which is also pretty widely used, but I guess has a lot of features which tend not be used and perhaps weren't as good an idea as they seemed at the time. What’s your view on that in the context of Java?
It does break down into different schools of thought really. There are those, as you say, who think that adding new features to the language will make it more complex, and there are those who believe that adding new features is necessary to maintain the vitality of the language and to attract new developers to Java.
There’s probably not a blanket answer for that; it’s a case of evaluating each of the features on its individual merits and arguing to bring it into a language depending on what it brings. It’s certainly true that, as a Java implementer, there are lots of features that we can bring to Java implementations which don’t have an impact on the API or the language itself, but deliver real benefit to IBM customers. And so we look at some of the enhancements to garbage collection technology, to the JIT compiler, to the diagnosability of defects and crashes, and so on and so forth, that mean there’s still a lot of investment that can be made in Java without adding to the complexity of the programming model itself.
3. One of the other fashions we’re seeing is for mixing different programming languages in one project; so Java 7 saw some moves to accommodate dynamic languages through the invokeDynamic bytecode instruction, and I know Oracle is looking to add a meta-object protocol in Java 9. Are there specific things that should be added to the JVM to support other styles of programming, such as functional programming, for instance?
So, with functional programming, as you know, that kind of doesn’t modify state a great deal; it’s more about the organization, the dispatch of functions in the programming language itself. In comparison to Java’s object oriented approach which has the state and the functionality together in types, and sometimes the state of the object can modify the behavior of the type. And that's a sort of disconnect between what you'd expect in functional programming and object oriented programming.
So yes, there are different ways of dealing with it. Certainly things like Clojure will modify state through pure functions or you can be more purist like Haskell. But at the end of the day, a lot of these tend to run on the JVM because of the many years’ investment that’s been made in the VM itself and that as a runtime engine; it’s sort of delivering for these languages and similarly a lot of them are using the core libraries as well and want to interact with those core libraries.
So when you’re looking to implement some of these new types of languages, like functional languages, there is a choice between mapping the characteristics of that language on to the Java implementation in the JVM, and doing that as efficiently as possible, so that the existing JVM and JIT technology will execute it most effectively; or a relatively heavyweight operation modifying the JVM to be able to accommodate different types of programming model. So invokeDynamic, yes, that was introduced in Java 7. To do functional programming, I guess you'd want things like tail recursion and continuations; I don’t really see them on the horizon any time soon, at least not in the next release or two.
4. Let’s talk about the Java 8 plans. Java 8 is due to ship probably around the middle of 2013 and there are three, I guess, big features in there. Linking to functional in some ways, I guess the most important is Lambda. Could you give us a description of that?
Lambdas are a new syntax that's going to be introduced in Java 8 to be able to define, think of them I guess as, anonymous methods. So you’ll be able to define a method in the body of an existing piece of code and assign it to a variable and pass it around as a function. I mean, you can do that in a limited way today, in Java 7, with anonymous inner classes, So you’ll be familiar with passing in an anonymous inner class to an existing method, perhaps as a visitor or a listener, that sort of a pattern. That has some limitations, not least the verbosity of the boiler plate code that you need to write to get that anonymous inner class, but also it necessarily has a reference to the enclosing type, and you can find yourself leaking memory by having lots of these references around, for perhaps listeners that are no longer listening to the events that they needed to originally.
So with the Lambda implementation, they’ll come with their own lexical scope; they won’t have those references unless you're explicitly referencing things outside of their scope. And then there’ll be augmentation of existing APIs in the class libraries; such as the collection hierarchy will allow you to pass in some of these functions and apply operations on to sets of data; and so it will be a mechanism not only for the developer to reduce the amount of typing that they do, but also, as an implementation, a way of them expressing their intent of doing an operation on the data which we can optimize as implementers, through these new extensions.
As I say, typically it’s things which have references to variables that are in a context. Today, with anonymous inner classes, all of those variables would have to be declared final and so you can’t modify them outside of the context of the anonymous inner class. With true Lambdas there will be references back to the context in which they were created and so you have to modify those. So they'll be used in contexts such as MapReduce. That’s the example, where you’re passing in functions which obviously transform data, and coalesce data, and do operations on potentially very large data sets.
Sure. So, extension methods are, again, related to the collections hierarchy, so we're talking about extending collections hierarchy to take functions as parameters. That will necessarily mean modifying the interface descriptions, that the definitions in the collection hierarchy. And because they are established and well used interfaces, that clearly has to be done in a way which is binary compatible, to ensure that existing programs are not broken. So extension methods are a way of simulating an extension to an existing interface that allows you to specify a default implementation, if the type of that interface doesn’t naturally implement that particular method.
So again, as a JVM implementer, you’d be resolving the method look-up for a type, today, with traverse up to object and looking for an implementation of that interface method. If we don’t find, it’s a linkage exception. With extended methods we’re going to then go back, and have another look and see if there's a default implementation, and choose the most specific default implementation. It’s not a huge deal in terms of the efficiency of the implementation, but it will have to be done reasonably carefully, and I think it would be quite interesting to see how people choose to use that new capability. I suspect it may be, sort of, new and interesting ways in which they find to exploit that functionality.
7. And then the third big ticket item in Java 8 is Project Jigsaw, the modularity framework. Obviously IBM is a big user and supporter of OSGi, so I’d be interested in your views on Jigsaw. As I understand it, Sun and now Oracle's, main argument for having it is that they needed support for split packages; so where items could reside in different packages or different modules but still end up on the same class loader at runtime. Given that, do you accept that there is a genuine need for Jigsaw?
There's certainly a need for modularizing SE; it is growing and any large system, such as SE, will benefit from the sort of decomposition that you get through modularization; that allows you to then selectively load modules and resolve any conflicts through more specific, targeted dependency between them. It also makes the type look up a lot more efficient, and so on. So, yes, I think there’s definitely the real driver there for modularizing SE itself. Now, to bring it round to OSGi, which was, sort of, part of your question; IBM you're right, invests in all sorts of technologies, including OSGi, and evolves those technologies to adapt to the needs of our customers and the applications that they’re writing. And OSGi has led the way in modularity for a long time; Jigsaw is still under the development and we are participating, through OpenJDK, in the evolution of SE modularity. So IBM leads the Penrose project OpenJDK, which is specifically tasked with looking at interoperability between OSGi and the Java module system in Java 8. And I think that will unlock... getting modularity into SE really will unlock another tranche of opportunities for enhancing some of the implementation that we have in footprint reduction; so things like the shared classes already exist in IBM Java 7, but we’ll be able to go that much further once we have modularity in there; ahead of time compilation, again already in Java 7 from IBM, and we’ll be able to do a lot more of it with modularity, and so on. So I think a lot of these things really are opening up more and more different places in which people can deliver on the technology and get some real benefits from it.
8. Oracle have also outlined some longer time plans for Java. As well as the meta object protocol we mentioned previously, there are plans for a unified type systems, removing the primitives (perhaps for Java 10) and there are moves from 32-bit to 64-bit addressable arrays for larger data sets. Are any of these particularly interesting to IBM and its customers?
It’s good to have the discussion of all these things which are a bit further out, and IBM has lots of ideas about things that we'd love to seeing in a Java 9 and 10 timescale. At this point, I think it's sort of looking at industry trends and seeing what sort of thing would you expect to see in that timescale. And kind on my radar really is obviously Cloud computing; a lot of people are picking up applications which are very Cloud oriented and how Java will behave in that sort of an environment. Virtualization, likewise; you know, a lot of the things which I see happening in the Java 9 / Java 10 timescale is Java adapting and picking up some of the capabilities that will allow Java to remain relevant and important in some of these new technology areas.
9. So you think there are specific things that need to happen for Java in the Cloud context, over and above what has already been developed in the context of Java EE. So for instance, you think there are things that need to happen in the runtime and in the virtual machine?
Yes, obviously Java is a virtual machine, and often your Cloud is running on a virtualized system; either a private virtualized system or a public virtualised system. But where you have these two different, potentially competing, layers of virtualization, then there needs to be some collaboration between them; otherwise you get what they call the stack of liars, where Java believes it has so many CPUs and memory available, and so on, and the virtualized system is betraying those to Java. So undoubtedly, resource constraints, dynamically moving applications between virtualized boxes and that sort of thing, are all, again, things which IBM is actively watching, interested in, and where our customers are going, and we’ll be adapting that sort of technology.
So, again, as an implementer, the obvious one is that we have to ensure that we use all of those cores. As a user for Java, the API may well be evolving over a period of time to provide APIs that allow users to recognize that. But in the first instance, the IBM Java has been running on IBM Power Architecture for many, many years which, has had n-way CPUs well before a lot of the competition. And so we already have an architecture which is highly scalable to high thread count, and so we already exploit that kind of behavior; but some of these CPUs may be virtualized, some of them may be very specialized and the VM has to be able to adapt and be aware of the sort of hardware that it's running on; and the memory associated with those systems is also highly variable as well.
And so, depending on whether you’re accessing local memory to the CPU or it’s a shared cache, can have significantly different types of characteristics. So, again, we have a GC policy, the Balanced GC which has just been released, which is already NUMA-aware, that's Non-Uniform Memory Access aware, and will adapt its GC collection depending on where it’s seen the best return for its collection.
I’m not familiar with the technical details of GC-First [sic] from Oracle, but I can tell you about Balanced GC, which splits up the Java heap into lots of different regions, and based on the different characteristics of those regions, will do different types of garbage collection within those regions and do it on different frequencies. And so again, it’s kind of dealing with the case where you will have some areas of memory which have long-lived objects, compared to those which have relatively short-lived objects, and GC those more frequently, and re-arrange the memory to ensure that objects are placed in the most efficient place.
12. That’s still based on the generational hypothesis, is it? Do any of the current programming models for parallel programming, so the actor model, message passing and so on, fit particularly well with Java?
That's right, yes. So there are certainly user libraries out there that implement these types of parallel programming models; again, it’s how well they map down on to the underlying Java representation. So they could well be providing a lot of benefit to the programmer as an abstraction mechanism, for them to think about how they’re doing their parallel programming design in terms of ensuring that it's a very efficient implementation.
We’re kind of focused on the Java memory model, obviously, and the locking model that Java provides. A general comment, it’s generally worth following the Java programming idioms because those are things that we’re going to expect to see in the compiler; the JIT compiler is going to expect to see them all optimized for. And then, similarly, making heavy use of the class library APIs which, clearly, are ones which we're familiar with as well; and so we have optimized for those. And so there are, as you say, different languages and approaches; there's an X10 language, from IBM, which is parallelization of the Java programming language. But all these things, ultimately, depend upon the implementation as it is on the virtual machine itself.
So I guess that things I would like to see, break down into two different areas: there's the things which are going to help the developer; and the things which are going to be part of the implementation changes. From the runtime, again, the exploitation of hardware resources, ensuring that we’re most effective in delivering the power that the hardware is providing. One of the advantages of working for IBM is that we have direct contact with the hardware designers, and can actually influence hardware design. Similarly, I’m in close contact with the middleware team and so we have the integrated stack, top to bottom, to really deliver an end to end optimized solution.
From a developer perspective, continuing the theme of making people enabled to write crisp, precise code, that’s going to be correct, and enhancements to the ability to visualize what’s happening in the virtual machines. So some of the tooling around Health Center, which is a way of introspecting the virtual machine at runtime and seeing how things are operating and whether your program is operating to its full potential, and so on. So I think there’s still plenty of excitement in this space, and lots of opportunities, so it’s still a cool place to be working.
So there are a bunch of things which, I think, could be removed; and with modularity, I think we may even have the opportunity to sideline, if not completely remove, some of these items from the class libraries. History tells us that people use everything, at some point, but there are certainly areas which are not as well used as others. One of the goals of modularity is to enable people to produce crisp lines between various parts of API and to take some of those things out, so you would be able to leave behind any areas that occur which you weren't using.
It is painful to see
Thankfully there is other JVM languages.
Java VM Talent leaving in droves from Oracle,does not bode well for the JVM
Re: Message parsing is not equal to message passing :-)
GC is still going to be a problem.
Cores are cores, I dont expect Oracle or IBM to have any particular problem in scaling that way. But Javas GC is still unacceptable above 8GB of heap.
Not that any dynamic language runtimes can do any better they just throw hardware at the problem in massive clusters and waste money on middleware, but this Java memory model, even with virtualization is not the future.
There are app servers on the market like JBoss that if you deploy the whole stack will not even startup without throwing out of memory exceptions on a 32 bit machine.
This is unacceptable and modularization is a bandaid for memory constraints. So is virtualization and clustering.
The fact is hardware is cheap and we should be able to scale a single app server serving thousands of users running on a single JVM with 250GB of ram.
I notice that the JSRs have given up on stack allocated memory, which is probably a good thing given the not so sharp state of Java devs these days,
but seriously every shop I work in that uses Java runs into out of memory issues or huge locks even if they are running on big iron.
As far as OSGI, I have to take a pass on that, its a broken, DLL hell way to manage dependencies at runtime, and IBM should bow out and let Oracle deploy
its own methods here. Java modularization cant possibly be as bad as OSGi in practice.