JavaOne 2016 - Day 2 "Thinking in Parallel" - InfoQ

Once again day two at JavaOne featured live coverage of four session rooms. Links to the videos for those sessions are provided at the end of this article.

InfoQ attended the session ‘Thinking in Parallel’, which we now discuss (see video link under Ballroom 4 below). This session was presented by Stuart Marks and Brian Goetz of Oracle. The tag-team did a wonderful job of explaining parallelism with streams where Marks concentrated on why we would use streams and Goetz concentrated on why we would use parallelism.

Marks started with an example of converting an array of strings into upper case and showed the conventional approach as well as the streams approach.

The conventional approach used a for-loop as shown here:

Here the input array is processed sequentially in a left to right order. But the computation is just the “toUpperCase” call and each computation is independent of the other. Also, the records can be processed in any sequence or in parallel. So, how about using Streams?

In the streams case, the array input is mapped to upper case and then back into an array. The input and the output array can be in any order and the code didn’t need to specify any ordering; the entire array was uppercased.

Marks claims that the Streams version is better due to the higher level of abstraction, independence of computation, and aggregation.

The next example demonstrated splitting a list at a location specified by a predicate, as shown below:

Marks pointed out that the subList method takes arguments that are indexes into the original list. To accomplish the requirement our algorithm must split at the location of the #'s. To emphasize, Marks put bars at the #s, which would become the edges of the sublist, and then synthesized split points at each end to create exterior bounds as shown below.

Implementation of this algorithm using a conventional approach yields:

There are some accidental complexities, for example: start = cur + 1; and the call to result.add outside the for-loop. Hence, Marks observed –

Marks contrasted that to the streams approach, revisiting the problem where we are only interested in indexes. So, instead of the iterative for-loop, we can stream over the indexes. Since the predicate and the resulting split computations are independent of each other, we can perform the predicate test on each element and everything else can be independent. The resulting code (after eliminating the array copying to add to the array) would look like this:

Marks concluded with his observation that the Streams approach is clearer -

Goetz then complemented Marks's setup by considering the advantages and disadvantages of going parallel. He started off with the slide below talking about parallelism which is a tradeoff between burning more resources and getting the result faster.

Goetz emphasized that parallel computation will always need more task parallelism work, such as dividing the work, managing and coordinating the tasks and completion. Consequently if the problem is to succeed it must be parallelizable and there must be a good parallel framework (e.g. the fork-join library). Finally, there should be enough data to be processed in parallel. Goetz then provided examples where parallelism can be useful and showed the audience the divide-and-conquer code:

Goetz highlighted that all parallel solutions essentially look like the above divide and conquer code. Partitioning the problem avoids synchronization overhead. For an efficient parallel algorithm, the problem should be divided up early. Goetz then provided an example of summing an array in parallel and gave some performance considerations:

Finally, Goetz went on to consider parallel stream performance. Streams are easy to parallelize, but not all sources may be suitable for parallelization.

Array-based sources are best

According to Goetz, the "NQ model" is a simple model to calculate a rule of thumb likelihood of parallel speedup:

Goetz talked about "source splitting efficiency", saying that arrays split evenly and cheaply due to their underlying representation. Also keep in mind that some operations on the pipeline are dependent on the sequence encountered; for example - a "limit of 10" operation is obviously applicable to the first 10, in order. Another consideration is that sometimes merging can be expensive.

Goetz summarized the session as follows: