Rubinius Internals: Threading, ObjectSpace, Debugging

Continuing from Part 1 of the Rubinius interview, Part 2 now goes into some implementation details.

Ruby 1.8.x currently uses userspace threads, which means it can't make use of multiple cores because the OS only sees and schedules one thread. Rubinius currently uses userspace threads, but Evan ponders other solutions as well:

Implementing multiple interpreters in the same address space is trivial for rubinius. The entire thing was written to be native thread-safe and reentrant. This came out of my experience working on sydney, which was a cleanup of 1.8.2.
You can easily create 2 machines (the base data structure) and initialize them both. They'll remain totally independent then. The only work to be done would be making sure both machines get their code scheduled properly. You could even fire up 2 machine instances in different native threads and let them communicate through channels, which would give you real multiprocessor enabled ruby.

Ruby includes a simple debugger in its standard distribution. It's implemented using the tracing feature. It's possible to use this by setting a callback that is invoked before a new line of code gets executed. The callback is registered with the set_trace_func method. (This is also used for profiling). The problem with this approach is its overhead. The execution of every line of Ruby code now means that the tracing function is invoked and has to decide whether to suspend execution at this point or not. There are other solutions using native extensions, like ruby-debug or the Ruby in Steel Cylon debugger.

Ruby compilers, such as Ruby.NET, IronRuby or XRuby just generate debugging information in the target IL or bytecode, and make use of the debugging features of the respective VMs.
Rubinius recently also gained debugging support with an implementation of breakpoints that only imposes an overhead if a breakpoint is hit. Evan explains the implementation:

I've already implemented the basic debugging facility, which gives us full speed breakpoints. Full speed breakpoints mean that the debugger imposes no speed penalty on the runtime for using the debugger. I see this as a HUGE win, because the speed of ruby's debugger has always been a gripe I hear. I've begun to build the higher level functionality on top of the FSB's, and eventually, I'll probably wire it into something like ruby-debug, or at least something that feels like ruby-debug.

Technically, FSB's work by doing bytecode replacement. So when a breakpoint is set, the system uses it's reflection to find the exact CompiledMethod object where the breakpoint needs to be set. It then calculates where is the bytecode the breakpoint needs to happen, and replaces the current instruction with a magic one, call yield_debugger. When that instruction is hit, it passes control to a debugger which is attached the currently running thread. When that method needs to continue, the old instruction is swapped back in and the instruction pointer is rewound by 1, then reactivated.
This works really well, because the debugger simple sits idle, waiting for thread running the real code to contact it. This also works really well because rubinius has first class method contexts. A method contexts is the same thing as a stack frame, it describes the state of running a method. It rubinius, you can ask the VM to hand you a method context for any state in the system. That object can then been inspected to find out exactly what was going on at that point in time. The simpliest way to get a method context is to call "MethodContext.current", which returns the method context for the currently executing method.

With the advent of Ruby implementations on managed runtimes such as the JVM and the CLR, Ruby's ObjectSpace feature has become a bit of a problem. ObjectSpace allows access to all reachable objects in a Ruby heap, eg.

ObjectSpace::each_object(Class){|x| 
  p x
}

prints all Class objects in the current Ruby heap.

JRuby's Ola Bini recently wrote about the performance impact of ObjectSpace on JRuby. Since the JVM (or the CLR) doesn't allow to access the heap directly, it's necessary to track every object creation and hold a list of all living objects. Evan explains the ObjectSpace situation in rubinius:

We actually haven't implemented it yet. We wont have as much trouble as jruby implementing it, because we have direct access to the memory location of objects.

Smalltalk implements this behavior mainly using a single primitive, called next_object. When you call next_object on most any object, it returns the object right after that one. Now, the definition of right after is implementation dependent, but usually means the object in memory right after the current one.
It should be noted this is NOT in any way guaranteed to be lossless, or accurate, and can still require some overhead to accomplish in a way that doesn't baffle the developer using the interface. It can be baffling because VM's (rubinius included) depend on the fact that they can rearrange objects in memory without causing any problems with the already running code.

Rubinius does this a lot actually, because one of the garbage collectors is a copy/compact collector, so young objects are constantly moving. And when you call next_object on an object one time and it returns object B, you can never depend on the fact that it the next time you call next_object, object B will again be returned.

One of thing related to this that rubinius has recently done is decouple object_id's from location in memory. MRI's object_id returns the address in memory of the object, so that later on, ObjectSpace._id2ref can be called with that number and return the object. _id2ref has a really easy time of this in MRI, because it just graps the object at that location in memory. But anyone that has used _id2ref a lot knows that sometimes you get back a totally different object than you put in. This is because the old object has died and a new object has been allocated in the same place. This doesn't keep people from using _id2ref, but they should be aware that it doesn't work the way they think it works.

Anyway, rubinius currently does not support _id2ref, because object_id's are not memory locations. We'll figure out some way to support it, but like jruby, it might up being pure overhead.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Dynamic Languages topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter