Introducing the Ruby Benchmark Suite
With the growing number of Ruby implementations, it is not only interesting to compare the compatibility with a set of tests (read more about the RubySpec Project), but also to benchmark the different implementations.
Antonio Cangiano has started the Ruby Benchmark Suite project. We talked to Antonio to learn more about the benchmark suite, what kind of code he plans to include in the benchmark and how others can contribute.
We asked Antonio about his plans for the Ruby Benchmark Suite:
The idea behind the Ruby Benchmark Suite is that we currently lack a standard set of benchmarks that we can use to measure the performance of Ruby implementations. In my previous shootouts, I used the set of benchmarks that I took from the Ruby 1.9's repository because it was convenient. Those tests alone are admittedly unsuitable though to surmise conclusive performance evaluations from. Having a Virtual Machine run an empty loop faster than another VM doesn't really tell us much about how these two will compare when running system administration scripts or Rails applications.
Hence the aim of the standard benchmarks is to be general enough so as to be representative of a variety of aspects that are typical of real world Ruby applications. Currently, we classify the benchmarks into the following categories:
From the feedback received so far and the interest shown by several developers of alternative Ruby implementations (including developers from GemStone, Microsoft, Engine Yard and Sun), I believe this project has a good chance of doing well.
- core-features: benchmarks that strictly exercise language features, with little need for library classes beyond basic arithmetic.
- core-library: benchmarks that specifically exercise Ruby's core library classes and methods.
- standard-library: benchmarks that specifically exercise Ruby's standard library classes and methods.
- micro-benchmarks: small benchmarks that are general but still far from real applications. Examples of these, are the benchmarks that were imported from The Computer Language Benchmarks Game or a few classic algorithms.
- real-world: perhaps the most interesting category, it includes macro-benchmarks that could be extracted from real world programs. For example, a good log processing script would fit perfectly in this folder.
We also asked how he organizes the benchmarks, to which Antonio replied:
Right now they are just a series of standalone benchmarks, but I plan to have a script that is able to run them and report on several metrics, including CPU time and memory usage. It is likely that for the next shootout only the execution time will be analyzed, but the long term plan doesn't ignore memory consumption, which is a particularly important aspect for servers.
The project is Open Source and released under the MIT license, so anyone is welcome to contribute. We're currently hosted on GitHub and regular contributors will be granted writing access to the repository as well. Those who are not too familiar with GitHub or Git, can always contribute by sending benchmarks to me by email (acangiano at gmail dot com) or directly to our Google Group
The best benchmarks are always your own programs, so the most appreciated contributions are those extracted from real programs, independently from their type (text processing, XML processing, number crunching, etc.). The log processor mentioned above is just one possible idea. For example in the real-world folder, we have Mr. Borasky's matrix benchmark because it's essentially real code in the field of numeric computing (if it wasn't for the fact that many would opt for fast C libraries instead).
Classic algorithms and other micro-benchmarks are welcome but, as already hinted at, we need benchmarks that give us a better indication of how all these Virtual Machines perform in the real world, because there is no point in claiming that, for example, Yarv is 3 times faster than Ruby 1.8.6, if real applications only show (say) an average 50% performance increase. On a side note, the standard-library folder needs some love too, as we need to improve our standard library classes and methods coverage.
Also interesting to know is whether the suite concentrates on the Ruby core and standard-library or if external libraries are benchmarked as well:
I plan to, at least to a certain extent, given that we don't want the suite to become huge. We need to keep in mind that many Ruby programmers rely on libraries like ActiveRecord or ActiveSupport and would like to be able to see how well each VM performs with these. As a matter of fact, in future shootouts it wouldn't be a bad idea even to test popular frameworks like Rails or Merb. Less mature VMs won't be able to run them, but this is also an important bit of information for the user who's interested in evaluating alternative Ruby implementations.
The last Ruby shootout was performed in December 2007, so we asked about the timeframe for the next one:
I plan to start running the tests for the shootout on June 24th and publish the results in my blog no later than the 30th. These days most of my free time is invested in writing the book Ruby on Rails for Microsoft Developers for Wrox, so the 24th is not an arbitrary date. It's the day after the deadline for my third chapter. If you consider that I'll be testing Ruby 1.8.x, Ruby 1.9, JRuby, Rubinius, IronRuby, MacRuby, Ruby Enterprise Edition and MagLev on (when supported) Mac OS X, Linux (both 32 and 64 bit) and Windows Vista, you have to account for a few days of testing, but I should make it by the 30th.