Ruby XML Roundup: Hpricot 0.7, Stable Libxml-ruby and Nokogiri

by Werner Schuster on Mar 24, 2009 |

Ruby's XML story has improved lately with a small arms race between XML libraries Nokogiri, Hpricot and libxml-ruby. Nokogiri was released last fall, and is based on the native libxml2 and libxslt:

Since Nokogiri leverages libxml2, consumers get (among other things) fast parsing, i13n support, fast searching, standards based XPath support, namespace support, and mature HTML correction algorithms.

Nokogiri also provides features such as searching with XPath and CSS selectors, and is supported on 1.9.1.

After some benchmarks showed Nokogiri to be in the lead when it comes to performance, Hpricot's maintainer _why put effort into improving the library and recently released version Hpricot 0.7:

Please enjoy a succulent, new Hpricot. A bit faster, some Ruby 1.9 support, and assorted fixes. [..]

I'm sure you're wondering what's the reason for Hpricot updates, in the face of heated competition from the Nokogiri and LibXML libraries. Remember that Hpricot has no dependencies and is smaller than either of those libs. Hpricot uses its own Ragel-based parser, so you have the freedom to hack the parser itself, the code is dwarven by comparison.

Best of all, Hpricot has run on JRuby in the past. And I am in the process of merging some IronRuby code[1] and porting 0.7 to JRuby. This means your code will run on a variety of Ruby platforms without alteration. That alone makes it worthwhile, wouldn't you agree?

Finally, libxml-ruby was released as version 1.0 with:

* Ruby 1.9.1 support
* Out of the box support for OS X 10.5 and MacPorts [..]
* A nice, clean API that makes it easy to do simple things, but provides all the power of libxml2 if you need it

The latest version is 1.1.3, which was released with a crucial improvement:

Working through the options one-by-one, I finally found the culprit, an obscure field in the structure:
int	dictNames	: Use dictionary names for the tree
What this setting controls is whether libxml2 uses a dictionary to cache strings it has previously parsed. Caching strings makes a big difference, so by default it should be enabled. That is now the case with libxml-ruby 1.2.3 and higher.

With this change, libxml-ruby now runs at about equal performance as Nokogiri.

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

General Feedback
Marketing and all content copyright © 2006-2016 C4Media Inc. hosted at Contegix, the best ISP we've ever worked with.
Privacy policy

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.