BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Ruby 2.2.0 Released, Featuring Incremental and Symbol GC

Ruby 2.2.0 Released, Featuring Incremental and Symbol GC

Leia em Português

This item in japanese

Ruby 2.2.0, released on December 25th, is the gift rubyists got for Christmas. Highlights include several garbage collection (GC) improvements. There is a new incremental GC algorithm and symbols are now garbage collectable. Ruby also got a collection of minor improvements on the core classes and its standard library.

Following the introduction of generational garbage collection in Ruby 2.1.0, which markedly improved the GC throughput, Ruby maintainers continue to introduce important changes in this space. The generational GC (RGenGC) classifies objects into generations, on the assumption that most objects die young. This assumption allows for high throughput and low pause time on younger objects, because older objects are only evaluated for deletion when there is no memory. But this means that older objects still suffer from high pause time.

The incremental GC (RIncGC), built on top of the generational GC, aims to cut that pause time while maintaining the same throughput. It achieves the shorter pause time by interleaving the mark phase, where objects are marked for GC, with Ruby's regular execution. Before Ruby 2.2.0, the mark phase was done in one big step.

Both RGenGC and RIncGC are unable to manage all objects, meaning that some objects are never promoted to older generations. This is mostly due to C-extensions, as it's impossible to guarantee that all respect the constraints required by RGenGC and RIncGC. Koichi Sasada presented at RubyConf2014 a thorough description of both RGenGC and RIncGC. It is an interesting read for those who want to know the all algorithm's details and performance benchmarks.

RIncGC does multiple small mark phases instead of a big one.

Stop the World GC vs Incremental GC. Source: Koichi Sasada.

RIncGC eliminates long pauses.

RIncGC eliminates long pauses. Source: Koichi Sasada.

The introduction of GC for symbols, a kind of string identifier, also improves Ruby memory management. So much so that Ruby on Rails 5.0, targeted for Fall 2015, will only target Ruby 2.2+ due to this change:

Rails 5.0 will target Ruby 2.2+ exclusively. There are a bunch of optimizations coming in Ruby 2.2 that are going to be very nice, but most importantly for Rails, symbols are going to be garbage collected. This means we can shed a lot of weight related to juggling strings when we accept input from the outside world. It also means that we can convert fully to keyword arguments and all the other good stuff from the latest Ruby.

Up until now, symbols could not be garbage collected because Ruby's internals mapped each symbol to an integer. CRuby - Ruby is written in C - used this integer as the symbol's identity. If a symbol was allowed to be GC'ed in the Ruby world and recreated later on, it would have a different CRuby integer id. This would mean that the same symbol, as per the Ruby language spec, would be effectively different, thus a bug.

The simple solution would replace the integer in CRuby by a string, thereby making both worlds (C and Ruby) coherent. Again, C-extensions complicated the matters as they prevent the runtime to detect and manage all symbols. The solution was to classify symbols into two groups: immortal and mortal. Immortal symbols continue to use integer id's and so are never GC'ed. Examples of immortal symbols include method names, variable names, constants and other language elements. Mortal symbols, e.g. "foo".to_sym, do not have an integer id and are thus garbage collectable.

Mortal vs Immortal Symbols. Source: Narihiro Nakamura.

Narihiro Nakamura, at RubyKaigi2014, described the symbol GC solution and all the constraints that led to it.

Still on the memory management front, Ruby 2.2.0 includes the option to use jemalloc instead of the system's malloc to possibly increase speed and decrease memory fragmentation. This is an experimental feature, until more performance data and use cases is gathered.

Process creation methods, such as system() and spawn(), now use vfork(2), if available, instead of fork(). This change can bring performance increases, especially when the parent process consumes lots of memory. It is also an experimental feature, so it can be changed in the future.

The core libraries now support Unicode 7.0 and include several new methods such as Enumerable#slice_after, Enumerable#slice_when, Float#next_float, Float#prev_float, File.birthtime, File#birthtime and String#unicode_normalize.

Ruby 2.2.0 deprecated the mathn library, while updating several others:

  • Psych 2.0.8
  • Rake 10.4.2
  • RDoc 4.2.0
  • RubyGems 2.4.5
  • test-unit 3.0.8
  • minitest 5.4.3

You can find more details, including some C API deprecations and breaking changes, in the Ruby 2.2.0 NEWS article. Ruby 2.2.0 saw 1557 files changes, including 125039 insertions and 74376 deletions, from Ruby 2.1.0.

Rate this Article

Adoption
Style

BT