MIT Extended LLVM IR to Enable Better Optimization of Parallel Programs

Researchers at MIT have been working on a fork of LLVM to explore a new approach to optimizing parallel code by embedding fork-join parallelism directly into the compiler’s intermediate representation (IR). This, the researchers maintain, makes it possible to leverage most of the IR-level serial optimizations for parallel programs.

Fork-join parallelism is a way to organize parallel programs that is especially suited to divide-and-conquer kind of algorithms, such as merge sort. Fork-join parallelism is supported in mainstream compilers such as GCC and LLVM through a set of linguistic extensions, for example those provided by OpenMP (e.g., #pragma omp parallel, #pragma omp parallel for and more) and Cilk Plus (cilk_spawn and cilk_sync). The compiler front-end handles those linguistic extensions to “lower” parallel constructs to a more primitive representation, which is then translated into IR. For example, the code snippet below uses Cilk cilk_for extension to make it possible to run each iteration of the loop in parallel:

__attribute__((const)) double norm(const double *A, int n);

void normalize(double *restrict out,
               const double *restrict in, int n) {
  cilk_for (int i = 0; i < n; ++i)
    out[i] = in[i] / norm(in, n);
  }
}

One of the drawbacks of this approach, though, is that the compiler middle-end does not see anymore a for loop, but only opaque runtime calls that abstract the for body in its own function and pass it into a library function that handles the spawning of the loop iterations and later synchronization. This effectively prevents the middle-end from applying all kinds of optimizations that may be applied at the IR-level to for loops, such as loop-invariant code motion, scheduling, etc.

Schardl, Moses, and Leierson’s work brings the fork-join model directly to the middle-end through an extended IR, thus enabling all kinds of optimization strategies to the code before adding any extra code required by parallel processing. This is not in itself new, and several specific IRs have been devised specifically to represent parallelism in a program, howver:

...adopting a separate IR into a mainstream compiler has historically been criticized as requiring considerable effort to engineer, develop, and maintain the additional IR to the same standards as the compiler’s existing serial IR.

The key here is that the three MIT researchers have found a way to extend LLVM’s IR to represent logical task parallelism by preserving their serial semantics. Tapir, as their new IR is called, represents parallel tasks asymmetrically, meaning that the parallel tasks must first complete before the execution flow can be sync-ed and this is what makes it possible for a serial middle-end such as LLVM’s to efficiently optimize parallel code, according to the researchers.

Tapir extends LLVM’s IR by adding three new commands: detach, reattach, and sync. While detach roughly corresponds to the same abstraction as fork, reattach and sync represent a slight departure from join. This is motivated by the requirement that, to be serializable, a parallel computation must ensure that the detached block is executed to completion before branching to the continuation. Thus, while detach and reattach express the start and end of a parallel task, sync synchronizes tasks spawned within its parallel context.

To evaluate the benefits of their new approach, the MIT researchers compared a Cilk-enabled LLVM compiler with a Tapir-enabled one when compiling a suite of 20 Cilk program.

For 17 of the 20 programs, the compiler using the new intermediate representation yielded more efficient software, with gains of 10 to 25 percent for a third of them. On the programs where the new compiler yielded less efficient software, the falloff was less than 2 percent.

Tapir is available on GitHub and can be built running:

git clone --recursive https://github.com/wsmoses/Tapir-Meta.git
cd Tapir-Meta/
./build.sh
source ./setup-env.sh

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Parallel Programming topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter