BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Doug Lea Discusses the Fork/Join Framework

Doug Lea Discusses the Fork/Join Framework

Bookmarks
   

1. My name is Ryan Slobojan and I'm here with Doug Lea, a professor at the State University of New York in Oswego. Doug, can you tell us a little bit about the Fork/Join Framework that you've been working on?

Sure. Fork/Join is a little engine that is intended to make it easier to get high performance, parallel, fine-grained task execution in Java. It's a little bit of a departure for us in java.util.concurrent, because mainly we've been concentrating on server side asynchronous communication and we've actually had this framework around. I first created a version of this in 1998 and wrote about it then and we've been really waiting for ubiquitous multicore MP computers to arrive, so that we don't put in something that doesn't actually help most people.

We've also had the luxury of having a few years to get it right in the meantime, because we didn't ship it with Java 5.0. It was actually triaged off the list of things to put out in java.util.concurrent.

We're also extremely conservative engineers. The code in java.util.concurrent we think real hard before we release and we usually will take our time and just ship it when it's done.

The basic idea is it's a framework that itself takes a little bit of getting used to to program directly in. It's a framework that most enjoys executing parallel recursively decomposed tasks. That is, if you have a big problem, divideit in 2, in parallel solve the 2 parts and then when they're done, combine the results. It turns out, in the same way that you can build anything from recursion, you can build things that operate on arrays, on maps, on anything out of that. Because we're just a library, that's where we stop. We have an engine that has a Fork/Join task, a map operating in Fork/Join pool, a few other little helper classes here and there. Those people who enjoy doing this - and there are a few - are getting really good results. We have really excellent performance.It depends of course on the problem you are solving, but we are capable of producing code with at least as good absolute performance and speed-ups as frameworks in other languages.

Part of this is that we have the luxury of building this frameworkon systems that have, for example, high performance, scalable, concurrent garbage collection, because garbage collection in these frameworks turns out to be an interesting issue. If you are implementing them, say in C, you generally have to slow down a lot of things in order to get the memory allocation right. In our case, we will spew lots of garbage, but it's very well behaved garbage and it's picked up by the garbage collector because it is unused 99% of the time.

The whole idea is when you think you might do something in parallel, you make a little task, you do this internal very slick work-steal and queue manipulation and then you see if anybody steals it from you. If anybody steals it from you it's because they didn't have enough to do when they decided to do some of your work for you. Once you get the hang of that, there's a lot of things you can do. We're also though, very conscious these days of building libraries for platform support. I think that there is upcoming support in Scala for nicer syntax, for automatically generating some of these recursive decompositions.

Clojure has been doing this for a while, X10 will be doing this - the IBM high performance language, Fortress in its current implementation uses our framework and several other research projects and experimental systems. We're providing fine-grained parallels and control for the languages that run on JVMs, rather than Java in particular, and that's really the way I think about this work - that I write it in Java on a VM because I can create very efficient algorithms.

I really don't care what syntax people use to access it, which is a little bit of a shift in focus for those of us building libraries in Java, where we were first mainly concerned with getting the Java APIs right and now we really want to get the functionality broad enough, regardless of whether you are programming it in Java or Scala or whatever, that you can still benefit.

   

2. What are some of the tasks which are supported out of the box with the Fork/Join Framework?

That's a good question, because the answer in the initial release that should be going into one of the open JDK milestones very, very soon is “Almost nothing". That is, it is very possible to build things like arrays for which every time you want to operate you say, “Apply to all" or “Map" or “Reduce", which are increasingly familiar notions. We don't ship those in Java because we are really unsure of language directions. With closures, function types, other miscellany like that, if they are going to be in the language you would support them in Java very differently than the way we would have to otherwise.

We chickened out; we are not going to release the layers on top of this ourselves.We make them available, so there is a package called Extra 166 that has all the things that we think are shippable, but we don't ship, because we're very conservative engineers. When you are putting code on a billion machines, you really don't want to make a serious mistake of putting something out that you will somehow need to retract and pull back in a year or 2. That means that right now, people who are using this framework are going to be the people who actually get into this parallel recursive decomposition and know how to use it.

There are actually many things these frameworks can do. Not only, for example,are things like parallel For Each loopspossible, but both the Clojure and Scala actor frameworks use the same engine with some different parameter settings than you would for parallel decomposition. The basic work-stealing framework is available to do a lot of things, not just “Apply to all" for arrays.

   

3. Some of the extras that you described seem like they would fit very well into an Apache commons type framework. What are your thoughts on that?

Our main mission is we want to get what we think are the central, most important bits of functionality in JDK proper. So again we're a little conservative- in cases where we weren't as conservativeas we wanted to be we've already regretted it. There are components that people have asked for for years that we still don't put in because we are not fully happy with them.

We have many, many requests for a concurrent cache, a variant of a concurrent hash map that has some sort of reclamation policy and discards and the like. We're not satisfied with any of our solutions, so we don't ship it. We do, however, put in our source repositories all the things we think are shippable that we don't ship. There are people who use it and it is not as well engineered, but there are many people using several things that have never made it in JDK.

We have, for example, a really pretty good non-blocking,double endedqueue we put together for Java 6.0 and then decided that the demand for that was not really enough to outweigh the added maintenance burden and API support and all of that for something that really wasn't used very much. We do that, too. What it means is when you look in java.util.concurrent you see what are our best guesses of what you might want to use, and again really different audiences.

Classically, when we started out, our by far main audience were people doing server side stuff, lots of clients, heavily concurrent, lots of threads, maintenance, and so we have a lot of components that are really good for that level of thing. As multicores and MPs get more prevalent, we want to support finer-grained stuff. That's where we're heading and then we'll do it when we're good and ready.

   

4. What are your thoughts on the future direction for the Fork/Join Framework?

We have frameworks that are really well tuned for the medium term future, the future of dozens to hundreds of cores or CPUs, enough data parallelism to keep them happy. We don't have remote execution, we don't have any clustering support: so if you want to split up a large task across several multicores each running in a Sun box - we don't do that.

If you would like to try to run more than 1,000 cores, we're a little scared and nervous because we don't actually know very much about the scalability of some of this. We're obsessive about testing real performance on real platforms, whenever these are developed.We're testing on the little two-way boxes, dual 8-cores Nehalems, dual Niagra 64 thread boxes, Azul boxes, but the next step where people are thinking “Maybe we'll get to 1,000 cores", we're not so sure. There are some scalability issues that are maybe going to make us rethink how we go about a fair amount of it.

The great thing about concurrent programming is you never get old. There is an infinite number of new problems that keep attacking you. [For example] the increasing cost of cache misses on multiprocessor multicores:If you have 2 Intel I 7s, then you have a very different box than with one of them - a really different box - and we try to put out code that does not cheat and say “Well, let's look at what kind of box we're at", we put out code that we're pretty confident will run on a generation of machines.

Our ability to predict past a few years is non-existent. That's always been so. There are several things that were in the initial release of java.util.concurrent that are going to need a little bit of maintenance upkeep, because there are engineering trade-offs hiding in some of the implementations that don't make as much sense as they used to. We will be doing a little bit of routine maintenance after all these big things about JDK 7 come out,hopefully soon.

   

5. There are many languages which have many approaches to parallel computing. In an absolutely ideal universe, given the way that parallel programmingseems to be progressing with multiple cores, what do you think is the best, most intuitive approach to parallel programming, from the programmer's perspective?

That's an unanswerable question. We had a workshop on teaching concurrency, here at OOPSLA, on Monday and there are many ideas, but I think everyone believes that every student - because we were talking about teaching - but every developer should have some understanding of coordinating asynchronous processes. That is maybe you generate threads, maybe you use some locks, maybe you decide to do it using events and actors where they are sending messages. This is a coordination of naturally asynchronous stuff. Why do you create a thread? Because you have another client, or because you have another node in your simulation of a biological process.

It's not because you want to speed up, it's just that's the way the world is and you have to learn how to coordinate it. The much over hyped other side is well you have these things,why don't you make your programs faster? They are really different points of view and I think everyone needs to understand them well, in part because the parallelism for the sake of speed up can be very easy.

If you have a decent language tool or other support, saying “All the operations in this loop could run in parallel", there are languages being devised and several reported on here at OOPSLA in the past few days, maybe we can get people to annotate their code, so that they wouldn't have to guess and we could prove that it was OK for this loop to run in parallel. I'm very much for that, I wish them a lot of success.

My role is a library guy. I don't pretend to know very much about language design, but I am very happy to yell at them as long as it takes to see my point of view about what the underlying implementations would like to do. The short answer is everybody should understand data parallelism, applied to all, For Each,map reduceand that's good because most people do understand that.

The other aspect is intrinsically a little harder. When you have concurrency not really of your choosing, then usually it's less regular, you have to use more custom techniques. That is why we have a lot of these custom techniques in java.util.concurrent.So we have phasers and barriers and countdowns and a bunch of things that are all really good solutions to a class of problems, with no aspiration to universality. If you are doing resource control, use a semaphore. There is a classic way to do that and it works well, don't use any of these other things, please.

So there is a little bit of domain knowledge and a deeper understanding of concurrency needed to build a good scalable server thanit is to build a good parallel ray tracer. A good parallel ray tracer is comparatively easy, but just as important and that's why we're supporting it. We believe that everyone should be able to get parallel speed ups without having to invent their own work-stealing framework, which took about a decade to produce the way we like it.

   

6. One of the difficulties that I've encountered as a developer is that I can usually reason very easily about the behavior of a sequential program, but as soon as parallel comes in and things can seemingly happen almost at random, I have much more difficulty with that. How can I reason more effectively about a parallel program in Java?

Another good and not completely answerable question. We do think that there is a good strong chance that languages can evolve to make it so that - one of the buzzwords these days is deterministic parallelism. That is parallelism that gives the same answer every time, at least to the extent you are allowed to know anything about the computation. So if you do a parallel operation on array elements and all those operations are independent, then you can tell they've operated in parallel, so long as the thing you'd asked them to operate on has no weird side effects, doesn't affect globals, doesn't have any ordering constraints.

For those kinds of things, I think there is actually language help on the way.There are people working on these issues and several of them use annotation style processing. Some of them you can think of as an extension of Mike Ernst's work on adding more information to types about constantness and purity. I think fairly happy thoughts about that aspect. The aspect that I think will always be difficult is if you want very high performance, highly scalable, concurrent data structures, then it's a difficult challenge.

Most people want to do it, but most people don't want to create their own red-black trees or B-trees either. Why do that yourself? Let those of us who are willing to spend a year finding out how to do a pure lock free, nonblocking queue implementation. Let us do it, because we love that. That's my favorite thing to do in the whole world - to come up with new nonblocking concurrent algorithms and they are very hard to reason about and they are very low productivity components. It will take a year, off and on, to put out 500 lines of code, but we really hope that no one else does it.

I do it, people like maybe a Cliff Click does it once in a while, a few people do it and we invite everyone else, “Please learn enough so you can join us because we need all the help we can get". But the number of people who have made it through, Maurice Herlihy and Nir Shavit have a new book called “The art of multiprocessor programming" and it has a fairly tough reputation already. It's on creating nonblocking algorithms. In a sense, the subtitle of the book is The Underlying Algorithms of java.util.concurrent, because many of them are, but we'd like people to learn about them, we don't want most engineers to implement them. They have work to do.

   

7. What are some of the differences between the java.util.concurrent libraries and the java.util libraries?

There is a little bit of a difference between java.util.concurrent and java.util, where java.util.concurrent really has this mission - put out the best algorithms, data structures we know. In java.util hash map or vector or things like that, the whole mind-set is to put out something that has no performance anomalies, which is a really different question.

Java.util hash map you could probably - if you have a hash map full of ints, int keys - do better only if you only use it very carefully and knowingly in those cases, but if you use them when it's there, maybe you'll get a little bit of blow up because of boxing and things like that, but it is a very regular algorithm and we put high priority on lack of surprise. There is a little bit of a difference between the concurrent work,where part of it is we specialize.

We have more kinds ofqueues, for example, than most people ever want to know about and we apologize that it is a little confusing, as even worse, we're adding yet another one to Java 7.0. Queues of course are very central to the internals of concurrency, because when threads are waiting they are put in queues, when messages are being sent they are put in queues, when producers and consumers are exchanging data, they are put in queues.

So there are a lot of kinds of queues and we don't make too many apologies for the fact that we are putting in our eighth in JDK 7.0 that is just right for what it does - it's called a linkedtransfer queue - very efficient. It includes a brand new, pretty cool algorithm that's an extension, an improvement over others that have been published and has - for those 2 people in the audience who want to know - the main virtue is that it can do both asynchronous and synchronous transfers, so you can send a message and forget about it and don't wait for it, or you can send a message and wait for the receiver to get it.

We support both of those, which is actually sort of uncommon and algorithmically tricky, but increasingly needed. In fact, we needed it, internally, in the Fork/Join pools. We mainly developed it for ourselves and then it took a life of its own, as we found that many people actually need this functionality.Not many application level programmers do, but people who are building server side frameworks and the like.

Jan 21, 2010

BT