Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News The Futures of Ruby Threading

The Futures of Ruby Threading

Leia em Português

A recent interview with Matz (Yukihiro Matsumoto), creator of Ruby, and Sasada Koichi, creator of YARV, tackles the topic of Ruby's handling of threads. Current stable releases of Ruby use user space threads (also called "green threads"), which means that the Ruby interpreter takes care of everything to do with threads. This is in contrast to kernel threads, where the creation, scheduling and synchronization is done with OS syscalls, which makes these operations costly, at least compared to their equivalents in user space threads. User space threads, on the other hand, can not make use of multiple cores or multiple CPUs (because the OS doesn't know about them and thus can't schedule them on these cores/CPUs).

Ruby 1.9 has recently integrated YARV as the new Ruby VM, which, among other changes, has brought  kernel threads to Ruby. The introduction of kernel threads (or "native threads") was widely greeted, particularly from developers coming from Java or .NET where kernel threads are the norm. However, there's a snag. Sasada Koichi explains:
 As you know, YARV support native thread. It means that you can run each  Ruby thread on each native thread concurrently.

It doesn't mean that every Ruby thread runs in parallel. YARV has  global VM lock (global interpreter lock) which only one running Ruby thread has. This decision maybe makes us happy because we can run most of the extensions written in C without any modifications.
This means: no matter how many cores or CPUs are available, only one Ruby thread will be able to run at any given time. There are workarounds and native extensions can handle the Global Interpreter Lock (GIL) in more flexible ways, for instance, release it before starting a long running operation. Sasada Koichi explains the API available for releasing the GIL:
You must release Giant VM Lock before doing blocking task.  If you need do this in extension libraries, use rb_thread_blocking_region() API.

  blocking_func, /* function that that will block */
  data,          /* this will be passed above function */
  unblock_func   /* if another thread cause exception with Thread#raise,
                    this function is called to unblock or NULL

The problem: this effectively removes the biggest argument for kernel threads, the use of multiple cores or CPUs, while retaining their problems.

Kernel threads are also the reason why Continuations might be removed in future Ruby versions. Continuations are a way for cooperative scheduling, which means that one thread of execution explicitly hands off control to another one. The feature is also known under the name "Coroutine", and has been around for a long time. Recently, it move into the public eye because of the Smalltalkbased web framework Seaside, which uses Continuations to significantly simplify web apps.

The  approach using Kernel threads with a GIL is  comparable to Python's thread system, which also uses a GIL, and has done so for a long time. Python's GIL has caused countless debates about how to remove it, but it has stuck around for all this time.

However, a look at Guido van Rossum, Python's creator, thoughts about threads, gives a view of an alternative future for Ruby threading. In a recent post about the GIL, Guido van Rossum explained:
Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities.

Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads.

Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions.
The benefits of preemptively scheduled threads that share an address space has long been debated. Unix was, for the longest time, single threaded or user space threaded. Parallelism was implemented with multiple processes which communicated via different means of InterProcess Communication (IPC), such as Pipes, FIFOs, or explicitely shared memory regions. This was supported by the fork syscall, which allowed to cheaply duplicate a running process.

Recently, languages such as Erlang have gained a lot of interest by also using a share ­nothing approach (called "lightweight processes) + easy IPC method. The "lightweight processes" are not OS processes, but actually live inside the same address space. They are called "processes", because they can not look into each others memory areas. The "lightweight" comes from the fact that they are handled by a userspace scheduler. For a long time, this meant that Erlang had the same problems as other userspace threaded systems: no support for multicores or multiple CPUs and blocking syscalls would block all threads. Recently, though, this was solved by adopting an m:n approach: the Erlang runtime now uses multiple kernel threads, and each one runs a user space scheduler. This means that Erlang now gets to take advantage of multicores and multiple CPUs, without changing its execution model.

Luckily for the Ruby space, the Ruby team is aware of this and is considering this future for Ruby:
 [...] if we have multiple VM instance on a process, these VMs can be run in parallel. I'll work on that theme in the near future (as my research topic).

 [...] if there are many many problems on native threads, I'll implement green thread. As you know, it's has some benefit against native thread (lightweight thread creation, etc). It  will be lovely hack (FYI. my graduation thesis is to implement userlevel thread library on our specific SMT CPU).
This indicates that userspace (green) threads version of Ruby is not off the table, particularly in light of  implementation problems of threading systems on different OSes, such as this one:
Programming on native thread has own difficulty. For example, on MacOSX, exec() doesn't work (cause exception) if other threads are running (one of portability problem). If we find critical problems on native thread, I will make green thread version on trunk (YARV).
Why is there a need for Sasada Koichi's Multiple VM (MVM) solution? Running multiple Ruby interpreters and having them communicate via IPC methods (e.g. sockets) is possible today as well. However,  it comes with a host of problems:
  • The Ruby process needs to exec a new Ruby interpreter, which means it needs to know how it was launched (which Ruby executable to use). This quickly becomes difficult to do in a portable way. For instance: if JRuby is used, the executable needs to be "jruby". Worse: the JVM or application server running it might not allow running outside programs.
  • The new Ruby interpreter needs to be set up with the correct ENV variables, LOADPATHs, Include Paths, and the main .rb file to execute
  • Communication can happen via DRb, but this needs to go via the network which is the only portable means of IPC.
  • Network communication means negotiating ports (which port should the "server" part of the two programs listen to).
  • Network communication also means potential problems with firewalls that complain about programs opening connections or opening ports.
Of course, these issues make  this much more complicated than the Thread equivalent of firing of a new thread of execution:
x ={ 
   p "hello"
Or this Erlang sample:
pid_x =  spawn(Node, Mod, Func, Args) 
This Erlang code spawns a new lightweight process, and indeed: this is all the code that's needed. All the set up code is taken care of, none of the problems explained above.
The pid is a handle to the new process, and allows, for instance, for simple communication:
pid_x ! a_message
This sends a simple message to the process with the pid store in pid_x. The message can consist of various types, for instance Atoms, Erlang's version of Ruby's symbols.

IPC as simple as this is certainly possible in Ruby too. Erlectricity, is a new library that permits communication between Erlang and Ruby, but it could just as well be used to work between Ruby VMs. Erlang IPC is particularly interesting, as it uses a pattern matching approach that facilitates message passing and makes it very concise.

The Ruby MVM is certainly the most promising idea for the future of Ruby threading. It avoids the problems of the GIL and of manually wrangling Ruby processes and uses the share nothing ideas that make Erlang and other systems appealing for concurrency.

JRuby is the only Ruby version that uses kernel threads, mostly because it's running on the JVM which supports them. The cost of creating kernel threads is somewhat offset by the use of thread pools (threads are created and kept around until they are needed). Details of IronRuby's threading support aren't known yet, but since the CLR and JVM are quite similar, it's likely that kernel threads will be used too.

One possibility to prototype and experiment with the idea of a Ruby MVM would be to launch multiple instances of JRuby in the same JVM process and have them communicate. This would effectively have the same cheap IPC (data can be passed simply by passing a pointer, as long as the data is read only).

Ola Bini recently wrote about his new jrubysrv idea, which allows to run multiple JRuby instances in one JVM to save memory.

As it seems, the details of future thread support in Ruby are still undecided and might be quite different in the alternative implementations.

Rate this Article