JRuby 1.1RC2 released with reduced memory requirements
- 260 issues resolved since JRuby 1.1RC1Next to the Java port of the Oniguruma Regex engine, the most significant performance improvement of JRuby 1.1 over JRuby 1.0 is the introduction of the Just In Time (JIT) compiler, which compiles Ruby code to JVM bytecodes. However, it also shows the problems that a JVM language implementation has to deal with.
- Large IO refactoring
- Memory improvements for JIT'd methods:
- Control total number of methods JIT'd
- Support a JIT cache between runtimes to return permgen
- Reduce codesize of generated methods (50-70% reduction)
One thing causing problems for JRuby's JIT is the way bytecode is managed in the JVM. The smallest loadable unit of bytecode in the JVM is a class - so if a Ruby method is JITed, the generated code is put into a method body in a new class, which is then loaded. However, this is a potential source of problems and a memory leak: bytecode is loaded into the PermGen, a Garbage Collector generation, which by default is quite small, usually 64 MB. Nick Sieger explains how quickly this could be filled up just with JITed Ruby methods:
Consider a non-trivial Rails application that makes liberal usage of the Ruby standard library, and also uses a handful of plugins, and the number of methods available for JRuby to compile can easily exceed 10,000. If the average overhead of a single JRuby method class is around 8K (varying due to method size, of course), this would occupy up to 80 megabytes of permgen space. (By contrast, the JVM’s default size of the permgen space is 64 megabytes, so we’re already over the limit).This is a very real problem - the PermGen behaves just like the regular Java heap: it has a fixed size, and once the PermGen is full, an OutOfMemory exception is thrown and eventually the JVM is terminated.
If you were to deploy 4 Rails applications each with 4 active runtimes into a single application server, you’re looking at almost 1.2 gigabytes of permgen space necessary to run your applications! (Usually, it’s common to run multiple applications in a Java application server, but with Rails applications that may need to be reconsidered.)
Nick Sieger explains the various solutions to this problem in RC2:
Because of this multiplicative cost, shortly after JRuby 1.1RC1 was released we took the somewhat drastic measure of capping the number of methods that each runtime would JIT-compile to 2048. But after a while it became obvious even with a threshold-based approach, JRuby was still wasting a ton of permgen space with duplicate copies of compiled methods. So for 1.1RC2 we introduced a JIT cache that could be set up to be shared among multiple runtimes.
The solution for this problem is already available as Dynamic Methods on the .NET platform. Instead of compiling Ruby methods into Java classes with a single method body, the bytecode would be stored in a method object - with the emphasis on object. These Dynamic Methods behave just like regular objects, which will be Garbage collected once they're not reachable anymore. This approach would also get rid of a lot of other overhead, as John Rose explains:
One pain point in dynamic language implementation is managing code dynamically. While implementor’s focus is on the body of a method, and the linkage of that body to some desired calling sequence, there is a host of surrounding details required by the JVM to properly place that code. These details include:Of course, a feature like .NET's Dynamic Methods is not available on the JVM. Research is going on in the Da Vinci Machine project, with prototypes available, but it remains to be seen when a feature like that will make it into the next Java release.
- method name
- enclosing class name
- various access restrictions relative to other named entities
- class loader and protection domain
- linkage and initialization state
- placement in class hierarchy (even if the class is never instantiated)
These details add noise to the implementor’s task, and often enough they cause various execution overheads. Because class of a given name (and class loader) must be defined exactly once, and must afterwards be recoverable only from its name (via Class.forName) the JVM must connect each newly-defined class to its defining class loader and to a data structure called the system dictionary, which will handle later linkage requests. These connections take time to make, especially since they must grab various system locks. They also make it much harder for the GC to collect unused code.
Roy Rapoport Aug 28, 2014