InfoQ

News

Oniguruma Java port speeds up JRuby

Posted by Werner Schuster on Nov 28, 2007

Community
Java,
Ruby
Topics
JRuby ,
Ruby on Rails ,
Performance & Scalability
Tags
JRuby ,
Language Features
Ola Bini reports that Joni, a port of Oniguruma, was merged into the JRuby trunk:
This is a glorious day! Joni (Marcin's incredible Java port of the Oniguruma regexp engine) has been merged to JRuby trunk. It seems to work really well right now.
JRuby team member Marcin Mielczynski took the job of porting the Oniguruma Regex engine to Java code - Oniguruma is the Regex engine included in Ruby 1.9.x.

This might just be the last installment in the (seemingly) never ending story about JRuby and Regular Expression (Regex) engines. Early JRuby versions used Java's built-in Regex library (included since Java 1.4) to implement Ruby's Regexes. While this was the simplest solution, not requiring any 3rd party libraries or ports, it also brought problems that made it unsuitable for JRuby. Since JRuby aims to be a compatible implementation of Ruby 1.8.x (or future versions), it's necessary to support the same Regexes. Java's implementation turned out to be incompatible, partially because of algorithm details that caused it to fail for some of the expressions. Ola explains the steps that followed:
To fix that, we integrated JRegex instead. That's the engine 1.0 was released with and is still the engine in use. It works fairly well, and is fast for a Java engine. But not fast enough. In particular, there is no support for searching for exact strings and failing fast, and the engine requires us to transform our byte[]-strings to char[] or String. Not exactly optimal. Another problem is that compatibility with MRI suffers, especially in the multi byte support.
All these problems seem to be - or will be - solved with Joni. Regex performance has been a big problem in the past (e.g. see Lessons from building Oracle Mix on JRuby on Rails), but Joni  seems to help with that too. Charles Nutter looked at REXML performance with the new code:
After running through a series of basic optimizations, most of the key expressions we worried about were performing as well as or much better than JRegex, so Ola went through with the conversion over the past couple days. Marcin is continuing to work on various optimizations, but both Ola and I have been playing with the new code. And it's looking great.
The linked article continues with the benchmark results comparing the code before and after the merge, which shows significant speed ups with the Joni code.

These issues also show a problem shared by many alternative Ruby implementations.  Rubinius, a Ruby implementation written in (mostly) Ruby, uses the simple solution of including Oniguruma. Ruby implementations based on VMs such as the JVM or .NET, however, have the problem that including a native library makes deployment more difficult (they'd need to ship platform specific versions). Not just that, as Marcin explains in a comment on Ola's blog, there are other integration issues:
We've been thinking about [including Oniguruma] already. There are few reasons:
Threading: Oniguruma uses global locks when initializing code range tables or managing shared AST nodes (like Character Class hashtable). Oniguruma bytecode interpreter also uses thread locks (it can be turned off but we get it for free in java land, and it'd be a hack to mix foreign threading with java one).
Exceptions: it would be hard to recover from segfaults. Converting Oniguruma errors to Ruby exceptions would also be an ugly hack.
JNI: it requires data separation, so all strings/bytes would have to be copied.
Additional binary distribution: good luck compiling it one Mainframe :D

No comments

Watch Thread Reply

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.