The Last Flight of the Unladen Swallow
Unladen Swallow was an attempt to bring LLVM optimisations to the CPython runtime, but hasn't seen significant activity for the last year. Now, a Unladen swallow retrospective confirms that the project is defunct and is no longer being developed.
The goal was an ambitious one; to port the LLVM runtime framework as the CPython interpreter, and merge it in with an option to turn on the checks for JIT compilation. LLVM is used in several high-profile projects, including the new Clang modular compiler and LLDB debugger, used by Apple's Xcode 4. These high-profile use-cases looked attractive:
The initial choice to use LLVM was made because at the time none of us had significant experience with x86 assmebly, and we really wanted to support x86 and x86_64 and potentially ARM down the road. We also believed that LLVM was a more robust JIT than it turned out to be. Apple was using the JIT engine in real products, and so we took that as a sign that it could work for us as well. While using LLVM helped us get off the ground very, very quickly, it quickly became a liability, and we ended up having to fix lots of bugs in the JIT support. It also brought with it lots of features we ended up not needing and which we had to spend time carving out in the name of memory footprint.
Compiler toolchains are notoruously difficult to get bug-free; a recent paper on finding and understanding bugs in C compilers highlighted the discovery of several bugs in the toolchains over time. However, the problems for unladen swallow were more aligned with the runtime nature of programming languages like Python than of pure code:
Unfortunately, LLVM in its current state is really designed as a static compiler optimizer and back end. LLVM code generation and optimization is good but expensive. The optimizations are all designed to work on IR generated by static C-like languages. Most of the important optimizations for optimizing Python require high-level knowledge of how the program executed on previous iterations, and LLVM didn't help us do that.
Many of the optimisations in JVM JITs use knowledge about how the program is running and perform subsequent JIT operations after data has been gathered to know if it is sufficient or not. The single biggest benefit is the in-lining of method calls; but this can't be done statically ahead of time. Instead, other optimisations simplify the code until the method in-lining can occur. For a Python-based JIT to deliver performance, integrating this function-call in-lining is a key part in being able to deliver performance boosts, which took some time to add on to the underlying LLVM infrastructure.
(It should be noted that LLVM undergoes randomised testing with Hardening LLVM with Random Testing, presented in November 2010 LLVM developer's meeting)
However, none of this helped the Unladen Swallow, whether African or European, take off. Partially the problem was one of sponsorship; the majority of users of Python weren't using it for performance-intensive tasks, so the optimisation was minimal. Secondly, the key developers behind CPython were largely disinterested in LLVM with the result that even had it been merged, it would likely have been disabled by default and removed further down the line.
There is a VMKit which is a base for building higher level languages on top of an LLVM runtime, including support for objects and automatic memory management; but this is more targeted towards Java and .Net runtimes.
The unladen swallow group is now referring others to PyPy instead; another Python runtime which uses a custom JIT in order to speed up execution. Part of the problem with speeding up Python is that not all code is “pure” python; there are many native extensions which are implemented in C which need to be taken care of appropriately. (Jython, an implementation in Java, does not handle the C implemented features of CPython directly, but instead re-writes them in Java.) But perhaps the biggest execution block, common to many interpreted programming languages, is the Global Interpreter Lock which prevents multi-threaded Python code from running. Neither PyPy nor Unladen Swallow could change that aspect.
LLVM 2.9 is due to be released next week. It is worth nothing that other projects, like Rubinius, are using LLVM as the runtime engine under the covers as well.
Roy Rapoport Aug 28, 2014