Performance Roundup: Heap Stacks Boost Threads in 1.8.x, MacRuby AOT, ZenProfile and EventHooks
Ruby 1.9 moved the Ruby world from 1.8.x's userspace thread system to native threads. While 1.9's native threads still suffer from the GVL (Global VM Lock), which allows only one Ruby thread to be executed at a time, the switch to native threads brought other benefits.
Joe Damato explores one problem in Ruby 1.8.x's thread implementation which went away with native threads in 1.9. In short: context switches in 1.8.x are quite expensive, since they cause a thread's complete stack contents to be copied; from the stack to the heap (for the suspended thread) and in the other direction for the scheduled thread. Applications with large stacks or with huge stack frames suffer from this implementation detail.
Native thread implementations avoid this inefficiency by maintaining multiple stacks and switching between them. Joe's post is a very detailed description of his "heap stacks", which bring this approach to Ruby 1.8.x.
The performance improvements are significant, ranging from 2x increases up to ~10x increases, which bring the benchmark results close to the results of 1.9.1.
Patched versions of the code are available on GitHub: for 1.8.6 and for 1.8.7.
The Heap Stacks solution is yet another attempt to eradicate the biggest inefficiencies of Ruby 1.8.x, along with the MBARI patches which fixed some long standing issues with continuations and the GC.
Another path to better Ruby performance is taken by the MacRuby project, which has recently started work on an LLVM based VM. Some of that work has now been used to create an Ahead Of Time (AOT) compiler for Ruby. AOT here is in contrast to Just In Time compilers, ie. instead of compiling at runtime, an AOT compiler run generates an executable out of the source code:
The expression is compiled into LLVM IR, then bitcode, then assembly, then machine code. True compilation :-)
There are many scenarios where this is useful:
It will be useful for 1) code obfuscation 2) use Ruby on environments where dynamic code generation is not allowed
Finally, profilers are a way to figure out bottlenecks in applications. Ryan Davis updated his zenprofile profiler, which uses event hooks in the Ruby runtime as efficient way to track method invocations. Zenprofile has been around for some time, but the updated version now relies on the event_hook gem, which factors out the native code necessary for setting up the hooks. By using event_hook, it's now possible to write pure Ruby event hooks instead of having to write native code to hook into the Ruby interpreter. Zenprofile makes use of that by offering a pure Ruby version of it's profiling logic, and a faster version which uses RubyInline and C for the native code.
A quick look at the zenprofile code shows that using event_hook is as easy as extending the
EventHook class, overriding a few methods such as
def self.process event, obj, method, klass to capture the events.
Zenprofile also offers the
spy_on feature, which can be used to focus on the performance of individual methods. The feature can be configured with Ruby code; eg. to focus on
Integer#downto, here an example from
require 'spy_on' Integer.spy_on :downto