Bindings, Platforms, and Innovation
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
Tracking change and innovation in the enterprise software development community
Posted by Werner Schuster on May 23, 2007 04:00 PM
A recent interview with Matz (Yukihiro Matsumoto), creator of Ruby, and Sasada Koichi, creator of YARV, tackles the topic of Ruby's handling of threads. Current stable releases of Ruby use user space threads (also called "green threads"), which means that the Ruby interpreter takes care of everything to do with threads. This is in contrast to kernel threads, where the creation, scheduling and synchronization is done with OS syscalls, which makes these operations costly, at least compared to their equivalents in user space threads. User space threads, on the other hand, can not make use of multiple cores or multiple CPUs (because the OS doesn't know about them and thus can't schedule them on these cores/CPUs).As you know, YARV support native thread. It means that you can run each Ruby thread on each native thread concurrently.This means: no matter how many cores or CPUs are available, only one Ruby thread will be able to run at any given time. There are workarounds and native extensions can handle the Global Interpreter Lock (GIL) in more flexible ways, for instance, release it before starting a long running operation. Sasada Koichi explains the API available for releasing the GIL:
It doesn't mean that every Ruby thread runs in parallel. YARV has global VM lock (global interpreter lock) which only one running Ruby thread has. This decision maybe makes us happy because we can run most of the extensions written in C without any modifications.
You must release Giant VM Lock before doing blocking task. If you need do this in extension libraries, use rb_thread_blocking_region() API.
rb_thread_blocking_region(
blocking_func, /* function that that will block */
data, /* this will be passed above function */
unblock_func /* if another thread cause exception with Thread#raise,
this function is called to unblock or NULL
)
Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities.The benefits of preemptively scheduled threads that share an address space has long been debated. Unix was, for the longest time, single threaded or user space threaded. Parallelism was implemented with multiple processes which communicated via different means of InterProcess Communication (IPC), such as Pipes, FIFOs, or explicitely shared memory regions. This was supported by the
Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads.
Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions.
fork syscall, which allowed to cheaply duplicate a running process. [...] if we have multiple VM instance on a process, these VMs can be run in parallel. I'll work on that theme in the near future (as my research topic).This indicates that userspace (green) threads version of Ruby is not off the table, particularly in light of implementation problems of threading systems on different OSes, such as this one:
[...] if there are many many problems on native threads, I'll implement green thread. As you know, it's has some benefit against native thread (lightweight thread creation, etc). It will be lovely hack (FYI. my graduation thesis is to implement userlevel thread library on our specific SMT CPU).
Programming on native thread has own difficulty. For example, on MacOSX, exec() doesn't work (cause exception) if other threads are running (one of portability problem). If we find critical problems on native thread, I will make green thread version on trunk (YARV).Why is there a need for Sasada Koichi's Multiple VM (MVM) solution? Running multiple Ruby interpreters and having them communicate via IPC methods (e.g. sockets) is possible today as well. However, it comes with a host of problems:
x = Thread.new{
p "hello"
}
Or this Erlang sample:pid_x = spawn(Node, Mod, Func, Args)This Erlang code spawns a new lightweight process, and indeed: this is all the code that's needed. All the set up code is taken care of, none of the problems explained above.
pid_x ! a_messageThis sends a simple message to the process with the pid store in pid_x. The message can consist of various types, for instance Atoms, Erlang's version of Ruby's symbols.
Ensuring Code Quality in Multi-threaded Applications
Usage Landscape: Enterprise Open Source Data Integration
Agile Development: A Manager's Roadmap for Success
Give-away eBook – Confessions of an IT Manager
Effective Management of Static Analysis Vulnerabilities and Defects
I like Guido's ideas, and it certainly would work for concurrency in the large. However, if you look at the micro sort of parallelism like fortress has, or haskell can do (where even small operations are parallel) IPC is really not suitable for that (think of a multicore CPU for instance). Ruby is a nice high level language, I would like to see threads come in in a high level way (if possible) not just do what other languages do for the sake of completeness.
If they are going to use multiple cores, the chance of visibility issues is going to increase as well. It took a long time before Java its memory model was fixed (and a lot of very smart people worked in it). How are these issues going to be tackled in Ruby?
Now here's a level of technical coverage I wouldn't have expected from infoq. Good thing though, these things are important when considering the future of Ruby (including IronRuby and JRuby) for enterprise development. Good work!
I'll second what Stefan said.
Me too. Great article!
Notably missing from the listed Ruby implementations is Rubinius, which as I understand it will support a wide variety of threading models, including green threads and the Erlang lightweight-process model. MenTaLguY is, I believe, pushing this forward, and considering porting his work over to some of the other implementations, too. Hanging around in #rubinius while MenTaLguY is around will probably yield more enlightenment.
It seems to be one of *the* most comprehensive coverage of threading in Ruby and the latest trends/research. But the problem I figured out is that there is no pattern of thought process developing or research heading somewhere concrete. Although Ruby MVM is posed as the best option available but still issues with that approach are mentioned. Also any successful work on that is hard to find. It seems threading in Ruby will remain experimental in nature and may only improve with the advent of some other language that has a solid implementation of the threading experiments done in Ruby side.
Antonio: yes, you're right. Some Rubinius coverage would be good, especially now that Evan Phoenix is paid for his Rubinius work. The current state of Rubinius threading seems to be userspace threads, with some more ideas/concepts on the horizon.
Ismail: Ruby's threading is not experimental in Ruby 1.8.x and JRuby 1.0, one has userspace threads the other one has kernel threads. The implementations are solid and their issues are known. Ruby 1.9.x is a bit of a wildcard right now, but that's alright since it's not a release yet, actually I think it's not even an 'alpha' or something like that. If you want a "solid implementation", then just use the existing Ruby 1.8.x or JRuby. If you want a "new" language with a mature multiprogramming story, take a peek at Erlang. Erlang has been used for some for some highly scalable and rock solid apps. Also: as for using only "solid implementations" ... Java's memory model (which is crucial for threading) was broken for the first 10 years of it's existence (it was fixed in 1.5), yet this didn't have a bad impact on the success Java or it's applications. Interesting times are ahead...
I find this topic to be very interesting. Since Ruby 1.9, JRuby and Rubinius have a lot of leeway to experiment with new features like MxN threading or Erlang style multiprocess concurrency I am confident good things are one the horizon. I think it is particularly important not to fall into the trap of thinking that native threads will be a concurrency panacea for Ruby (or any other language). Erlang definitely seems to have the right idea. BTW, I wrote about this topic on my blog earlier this month: Are Native Threads Worth It?
"The Ruby process needs to exec a new Ruby interpreter, ..." It doesn't seem like that is necessary if you just use fork()? sRp
Oz/mozart is another language with super light weight internal threading, very much like erlangs light weight threading. They've also gone the route of no smp, use multiple processes. At the moment, i think the main disadvantage of such is that it is slightly harder to take advantage of multiple processors/cores. For instance, instead of running a spawn for every element in a list and collecting the results, you'd need to include logic to distribute the spawns to different processes and possibly even worry about dealing with a single process crashing. Most of this could be solved by having a good set of libraries to make the default case easy. The advantage is that it should make it much easier in the future to take fuller advantage of things like NUMA/SUMA/SUMO without having to deal with memory locality issues and expensive barriers. Such libraries should also be beneficial for making some very closely networked machines (such as blades) function together. sRp
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.
This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.
This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.
This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.
After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.
IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.
Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.
12 comments
Watch Thread Reply