Concurrency Revolution From a Hardware Perspective
Brian Goetz and Cliff Click spoke at JavaOne conference last week about the concurrency revolution from a hardware perspective. They started off the discussion saying that the clock rate has been increasing exponentially until recently, but not so any longer. And for years, CPU designers focused on increasing sequential performance with techniques like higher clock frequency and Instruction Level Parallelism (ILP) but this approach is limited. Going forward, the designers will focus on parallelism for increasing the throughput.
Brian gave an overview the main periods in CPU history which includes the CISC and RISC systems era and now multi-core machines. Cliff discussed the impact of data caching on the system performance. He said, as a general principle developers should think about data, not code. Data locality should be a main design concern for high-performance software. It's also important to follow the principle of share less, mutate less. The combination of sharing the mutable data is not desired because this will cause cache contention and requires synchronization.
Some of the point solutions to achieve concurrency in applications are:
- Thread Pools and Worklists: Thread Pools and Work Queues approach is a reasonable solution for coarse-grained concurrency which includes server applications with medium-weight requests like Database, File and Web Servers. This library support was added in JDK 5.
- Fork/Join: The Fork/Join technique is used for recursive decomposition. This is a good approach for lightweight CPU-bound problems that fit in memory, but not so good for I/O bound operations. Fork/join framework is already available in OpenJDK project and will be part of JDK 7 Release.
- Map/Reduce: Map/Reduce approach is used to decompose the data queries across a cluster. It was designed for very large input data sets (usually distributed) and the framework handles the concerns like distribution, reliability, and scheduling. Open-source implementations like Hadoop are available for Map/Reduce.
- Actors: In Actors computing model, the state is not shared and all mutable state is confined to actors. Actors communicate by sending messages to each other. This model works well in Erlang and Scala and possible in Java, but requires more discipline to implement.
- Software Transactional Memory (STM): The Software Transactional Memory approach has been sold as "garbage collection for concurrency". It works in Clojure language because Clojure is mostly functional and limits mutable state.
- Graphics Processing Units: Graphics Processing Units (GPUs), which have several simple cores, are great for doing the same operation to lot of data. They are widely supported by APIs like CUDA, OpenCL, and Microsoft’s DirectCompute.
The speakers concluded the presentation by saying that the CPU's have grown under the hood and the performance model has changed and the new approach is to have more, simpler cores. They also suggested that developers should think about the parallel computing requirements from the initial phases of the application development process.
I think that the available APIs/Frameworks/Libraries help not only as implementation tools but more importantly they are instrumental in changing the programmer's thought process. There is some quote along the lines of 'constraints breed innovation' and it applies while building parallel programs. Some things you wish you could do but these A/F/Libs steer you away from that direction. In retrospect you realize what a shortsighted idea it was and pat your framework of choice on the back.
Regarding education, I think that more rigorous fundamental comp. sci. courses should take priority over parallelism education. I took a few of them (less than 5 years ago) so I don't understand the widespread press about "programmers are going to have to get ready" and "major shifts in their thinking will be needed." Get ready? Too late! These concepts have been around since shortly after the dawn of actual physical computer hardware (maybe earlier?). The concepts are being taught at major universities. I remember at the time being confused on certain topics but it was mainly because I lacked in depth working knowledge about some fundamental. Also not every problem can magically be solved with more cores churning.
As an undergrad I was fortunate enough to take the late Per Brinch Hansen's graduate course about parallel programming. He was so thoughtful and expertly detailed exactly why and what needed to be done to achieve concurrency and parallelism. And all the while poking fun at Java's Threads and telling anecdotal stories about Denmark. I believe he was publicly vocal about it as well.
Keith Adams Dec 06, 2013