BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Clustered JRuby - Transparent Clustering of JRuby with Terracotta

Clustered JRuby - Transparent Clustering of JRuby with Terracotta

Leia em Português

This item in japanese

Bookmarks

The MagLev demo at RailsConf 2008 demonstrated Gemstone's distributed VM technology, which allows to share the same object memory transparently across multiple Gemstone VMs.

Terracotta is a Java technology that allows something similar. Fabio Kung has now started experimenting to let JRuby take advantage of Terracotta. Similar projects have been attempted in the past. Gemstone also experimented with supporting JRuby on top of their Java based product and there was a previous attempt of JRuby on Terracotta, although that hasn't been updated in some time.

We talked to Fabio Kung about his project, which he refers to as "JMaglev", about what's involved in getting JRuby and Terracotta to work together and which problems need to be solved before it will work.


First off, Fabio explains his implementation and how he had to modify JRuby to make it work:

I'm using Terracotta POJO clustering to make JRuby internals shared to all cluster nodes. In fact, each runtime has a collection of global variables, something like:

public class Ruby {
  // ...
  private GlobalVariables globalVariables = new GlobalVariables();
}
and:
public class GlobalVariables {
  // ...
  private List values = new ArrayList();
}

Terracotta is just clustering this global variables list. Any change to that list is replicated to all JRuby runtimes in the cluster. The beauty is that you can add _any_ ruby object to this list, even complicated objects like regexps, hashes and procs. All global variables are automatically shared and any object referenced by a global variable is instrumented by Terracotta to be clustered.

To support it, I had to patch JRuby, making it "clusterizable". In fact, every ruby object in JRuby maintains a hard reference to the ruby runtime. Shared objects must be used in many different runtimes, so JRuby must support some kind of runtime attachment/detachment. I was able to make it work, but only with one Ruby Runtime per JVM. There are still unresolved and to be discussed issues in this area, here are some:

- global object identities: should object_id be the same in all nodes? - shared metaclasses: what happens when the object class, superclasses or included modules are changed in different nodes? - support for many runtimes per JVM

I took simple solutions to all of these problems, but each solution would require an entire blog post. ;-)


Fabio explains the use cases he has in mind for JRuby and Terracotta:

In conjunction with Terracotta's High Availability mode, I think "JMaglev" (perhaps it needs a better name) can indeed be a reliable alternative to memcached without being intrusive in the Ruby code. But there's a lot of work to be done. That is the reason I have made it public; anyone who wants can contribute.
http://github.com/fabiokung/clustered-jruby/

Many servers can be configured in Terracotta, with one being the "master" (or active server) and others in hot-standby mode. Here is where things start to become more interesting, because if the active server crashes, other automatically takes place to increase availability. There is even a mode available in the Enterprise version of Terracotta that enables multiple active servers, similar to what is achieved with memcached, but memcached doesn't makes objects persistent.

Terracotta might act as a distributed cache and doesn't use Java serialization: it just replicates what changes. You have only to make objects you would retrieve from the database shared to all cluster nodes. With JMaglev, it means you have just to put them in global variables - $shared = Person.find(:all).

Other possible use case is HttpSession sharing among many processes and machines in Rails applications. People deploying rails applications to JRuby could use the transparent clustered objects to maintain HttpSessions shared to all cluster nodes.

In fact, any Terracotta use case is a JMaglev use case. Honestly, I've done it just because it can be done. Pretty much similar to Avy Briant's Maglev case: he said it would be possible to use SmallTalk VMs to run Ruby code, then Gemstone guys called him to prove it could indeed be done. ;-)

I'm expecting people more creative than I can come with more creative use cases for "JMaglev".


Distributed object memory is just one of the features of Gemstone/S (and MagLev); another important feature is persistence. As Gemstone's Monty Williams explains in a recent episode or the Rails Podcast, Gemstone/S supports persisting the object memory, which means there's no need for an ORM or even a RDBMS to store data.
When asked about whether "JMaglev" could support something similar, Fabio explains:

All shared ruby objects live in the Terracotta Server, which can automatically persist those objects, even when they aren't Serializable. Clients hold stubs to the real, shared objects. You just have to configure the server to be in persistent mode. I haven't tested it yet, but it should be one line of XML configuration.

I think Terracotta could be used as an OODB to persist JRuby objects, although I don't think it is currently the primary goal. Terracotta is already persisting shared objects through its High Availability mode, that exists for fail-safe-high-available deployments.
http://www.terracotta.org/web/display/docs/Configuring+Terracotta+For+High+Availability.


Terracotta's website lists many Terracotta Integration Modules (TIM), some of them for popular ORM solutions. When asked if these could help with persistence, Fabio explains that these TIMs serve a different purpose:

Those Terracotta Integration Modules (TIM) aren't related to automatic persistence for shared objects. They are just helping Terracotta to be used in conjunction with those ORM frameworks. The hibernate TIM, for example, hasn't anything to do with persistence. It just enables Hibernate to use a clustered (distributed) EhCache (and others) with little effort, without having to appeal to true distributed caches like JBoss TreeCache and memcached.

Fabio has put a screencast showing of how the sharing with JRuby and Terracotta works. To try this, see Fabio's clustered-jruby repository at Github provides everything necessary.

Rate this Article

Adoption
Style

BT