InfoQ

News

JRuby 1.1RC2 released with reduced memory requirements

Posted by Werner Schuster on Feb 21, 2008 09:00 AM

Community
Java,
Ruby
Topics
JRuby,
Dynamic Languages
Tags
JRuby
The second Release Candidate for JRuby 1.1 (RC2) has been relased, and it's quite an improvement over RC1:
- 260 issues resolved since JRuby 1.1RC1
- Large IO refactoring
- Memory improvements for JIT'd methods:
   - Control total number of methods JIT'd
   - Support a JIT cache between runtimes to return permgen
   - Reduce codesize of generated methods (50-70% reduction)
Next to the Java port of the Oniguruma Regex engine, the most significant performance improvement of JRuby 1.1 over JRuby 1.0 is the introduction of the Just In Time (JIT) compiler, which compiles Ruby code to JVM bytecodes. However, it also shows the problems that a JVM language implementation has to deal with.

One thing causing problems for JRuby's JIT is the way bytecode is managed in the JVM. The smallest loadable unit of bytecode in the JVM is a class - so if a Ruby method is JITed, the generated code is put into a method body in a new class, which is then loaded. However, this is a potential source of problems and a memory leak: bytecode is loaded into the PermGen, a Garbage Collector generation, which by default is quite small, usually 64 MB. Nick Sieger explains how quickly this could be filled up just with JITed Ruby methods:
Consider a non-trivial Rails application that makes liberal usage of the Ruby standard library, and also uses a handful of plugins, and the number of methods available for JRuby to compile can easily exceed 10,000. If the average overhead of a single JRuby method class is around 8K (varying due to method size, of course), this would occupy up to 80 megabytes of permgen space. (By contrast, the JVM’s default size of the permgen space is 64 megabytes, so we’re already over the limit).
[..]
If you were to deploy 4 Rails applications each with 4 active runtimes into a single application server, you’re looking at almost 1.2 gigabytes of permgen space necessary to run your applications! (Usually, it’s common to run multiple applications in a Java application server, but with Rails applications that may need to be reconsidered.)
This is a very real problem - the PermGen behaves just like the regular Java heap: it has a fixed size, and once the PermGen is full, an OutOfMemory exception is thrown and eventually the JVM is terminated.

Nick Sieger explains the various solutions to this problem in RC2:
Because of this multiplicative cost, shortly after JRuby 1.1RC1 was released we took the somewhat drastic measure of capping the number of methods that each runtime would JIT-compile to 2048. But after a while it became obvious even with a threshold-based approach, JRuby was still wasting a ton of permgen space with duplicate copies of compiled methods. So for 1.1RC2 we introduced a JIT cache that could be set up to be shared among multiple runtimes.

The solution for this problem is already available as Dynamic Methods on the .NET platform. Instead of compiling Ruby methods into Java classes with a single method body, the bytecode would be stored in a method object - with the emphasis on object. These Dynamic Methods behave just like regular objects, which will be Garbage collected once they're not reachable anymore. This approach would also get rid of a lot of other overhead, as John Rose explains:
One pain point in dynamic language implementation is managing code dynamically. While implementor’s focus is on the body of a method, and the linkage of that body to some desired calling sequence, there is a host of surrounding details required by the JVM to properly place that code. These details include:
  • method name
  • enclosing class name
  • various access restrictions relative to other named entities
  • class loader and protection domain
  • linkage and initialization state
  • placement in class hierarchy (even if the class is never instantiated)

These details add noise to the implementor’s task, and often enough they cause various execution overheads. Because class of a given name (and class loader) must be defined exactly once, and must afterwards be recoverable only from its name (via Class.forName) the JVM must connect each newly-defined class to its defining class loader and to a data structure called the system dictionary, which will handle later linkage requests. These connections take time to make, especially since they must grab various system locks. They also make it much harder for the GC to collect unused code.
Of course, a feature like .NET's Dynamic Methods is not available on the JVM. Research is going on in the Da Vinci Machine project, with prototypes available, but it remains to be seen when a feature like that will make it into the next Java release.

No comments

Reply

Exclusive Content

Rationalizing the Presentation Tier

Thin client paradigm characterized by web applications is a kludge that needs to be repudiated. Old compromises are no longer needed and it's time to move the presentation tier to where it belongs.

Agile Project Management: Lessons Learned at Google

In this presentation filmed during QCon 2007, Jeff Sutherland, the creator of Scrum, talks about his visit at Google to do an analysis of Google's first implementation of Scrum.

AtomServer – The Power of Publishing for Data Distribution

In this article, Bryon Jacob and Chris Berry introduce AtomServer, their implementation of a full-fledged Atom Store based on Apache Abdera, which is now available as open source.

An Introduction to Virtualization

It is easy to think that virtualization applies only to servers. In reality the recent resurgence of the concept is also being applied to networking, storage, and application infrastructure.

REST Anti-Patterns

In this article, Stefan Tilkov explains some of the most common anti-patterns found in applications that claim to follow a "RESTful" design and suggests ways to avoid them.

Choosing between Routing and Orchestration in an ESB

In this article, Adrien Louis and Marc Dutoo discuss the differences and relative merits of using orchestration vs. routing in a typical ESB setup, and discuss various implementation options.

Enterprise Batch Processing with Spring

Wayne Lund discusses batch processing, Spring Batch objectives and features, scenarios for usage, Spring Batch architecture, scaling, example code, failures and retrying, and the future roadmap.

User Story Estimation Techniques

Developer Jay Fields draws on his experiences as a ThoughtWorks consultant to describe effective user story estimation techniques.