InfoQ

News

Treetop - PEG parser generator for Ruby

Posted by Werner Schuster on Jan 17, 2008 07:00 PM

Community
Ruby
Topics
Domain Specific Languages,
Programming
Tags
Code Generation,
Language Features,
Languages
Ruby already ships with a parser generator called RACC, a version of YACC (which is used to build ruby_parser, the first Ruby parser written in Ruby).

When it comes to parser generators, Parsing Expression Grammars (PEG) have become quite popular recently after a thesis by Bryan Ford introduced an optimization called "Packrat Parsing". Packrat parsing solves the problem of this kind of parser, i.e. exponential parse time. This is caused by the fact that the parser uses backtracking to parse the code, i.e. they try many if not all possible combinations of outcomes until they figure out the right one. The solution of Packrat parsing is to use memoization, i.e. storing intermediate parsing results, instead of calculating these results over and over. This allows for the runtime behavior to be linear, but has the downside of relatively large memory requirements, potentially multiple times the size of the input source. Note that other parser generators, such as ANTLR, use similar approaches.

With this in mind, the Treetop website explains the benefits of PEGs:
Parsing expression grammars (PEGs) are simple to write and easy to maintain. They are a simple but powerful generalization of regular expressions that are easier to work with than the LALR or LR-1 grammars of traditional parser generators. There's no need for a tokenization phase, and lookahead assertions can be used for a limited degree of context-sensitivity.
Treetop generates parse trees automatically but allows the user to customize the generated nodes by adding methods:
grammar Arithmetic
 rule additive
 multitive '+' additive {
 def value
 multitive.value + additive.value
 end
  }
 /
 multitive
end
# other rules below ...
end
This means that the node generated for the additive node will have a method named value. Alternatively, it's possible to specify a node class to be generated for each rule. (Note: the slash is the choice operator, i.e. the additive is either an two operands separated by a plus character or just the result of the multitive rule).

To get started with Treetop, you need to install it first. Either get the Treetop source from the Rubyforge project, or install it as a gem:
gem install treetop 
To start using it, check the Treetop documentation or look at the provided examples. Treetop ships with a simple parser for arithmetic expressions and a very basic language parser and runtime.

Treetop can turn a grammar definition file into Ruby code with the tt utility:
tt foo.treetop  
Another option is to do the parser generation from Ruby code:
Treetop.load "arithmetic"
parser = ArithmeticParser.new
parser.parse('1+1')

For a live demo of the Treetop by it's creator, see Nathan Sobo's RubyConf 2007 presentation of Treetop

1 comment

Reply

Documentation by Mickael Faivre-Macon Posted Jan 26, 2008 3:57 PM
  1. Back to top

    Documentation

    Jan 26, 2008 3:57 PM by Mickael Faivre-Macon

    The documentation is terse, and lack many example for use in ruby code. Does someone know if the project has a mailing list of a forum ?

Exclusive Content

Rationalizing the Presentation Tier

Thin client paradigm characterized by web applications is a kludge that needs to be repudiated. Old compromises are no longer needed and it's time to move the presentation tier to where it belongs.

Agile Project Management: Lessons Learned at Google

In this presentation filmed during QCon 2007, Jeff Sutherland, the creator of Scrum, talks about his visit at Google to do an analysis of Google's first implementation of Scrum.

AtomServer – The Power of Publishing for Data Distribution

In this article, Bryon Jacob and Chris Berry introduce AtomServer, their implementation of a full-fledged Atom Store based on Apache Abdera, which is now available as open source.

An Introduction to Virtualization

It is easy to think that virtualization applies only to servers. In reality the recent resurgence of the concept is also being applied to networking, storage, and application infrastructure.

REST Anti-Patterns

In this article, Stefan Tilkov explains some of the most common anti-patterns found in applications that claim to follow a "RESTful" design and suggests ways to avoid them.

Choosing between Routing and Orchestration in an ESB

In this article, Adrien Louis and Marc Dutoo discuss the differences and relative merits of using orchestration vs. routing in a typical ESB setup, and discuss various implementation options.

Enterprise Batch Processing with Spring

Wayne Lund discusses batch processing, Spring Batch objectives and features, scenarios for usage, Spring Batch architecture, scaling, example code, failures and retrying, and the future roadmap.

User Story Estimation Techniques

Developer Jay Fields draws on his experiences as a ThoughtWorks consultant to describe effective user story estimation techniques.