InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Treetop - PEG parser generator for Ruby

Posted by Werner Schuster on Jan 17, 2008

Sections
Development,
Architecture & Design
Topics
Domain Specific Languages ,
Ruby ,
Programming
Tags
Languages ,
Language Features ,
Code Generation

Ruby already ships with a parser generator called RACC, a version of YACC (which is used to build ruby_parser, the first Ruby parser written in Ruby).

When it comes to parser generators, Parsing Expression Grammars (PEG) have become quite popular recently after a thesis by Bryan Ford introduced an optimization called "Packrat Parsing". Packrat parsing solves the problem of this kind of parser, i.e. exponential parse time. This is caused by the fact that the parser uses backtracking to parse the code, i.e. they try many if not all possible combinations of outcomes until they figure out the right one. The solution of Packrat parsing is to use memoization, i.e. storing intermediate parsing results, instead of calculating these results over and over. This allows for the runtime behavior to be linear, but has the downside of relatively large memory requirements, potentially multiple times the size of the input source. Note that other parser generators, such as ANTLR, use similar approaches.

With this in mind, the Treetop website explains the benefits of PEGs:

Parsing expression grammars (PEGs) are simple to write and easy to maintain. They are a simple but powerful generalization of regular expressions that are easier to work with than the LALR or LR-1 grammars of traditional parser generators. There's no need for a tokenization phase, and lookahead assertions can be used for a limited degree of context-sensitivity.

Treetop generates parse trees automatically but allows the user to customize the generated nodes by adding methods:

grammar Arithmetic
 rule additive
  multitive '+' additive {
   def value
    multitive.value + additive.value
   end
  }
 /
  multitive
 end
# other rules below ... 
end 

This means that the node generated for the additive node will have a method named value. Alternatively, it's possible to specify a node class to be generated for each rule. (Note: the slash is the choice operator, i.e. the additive is either an two operands separated by a plus character or just the result of the multitive rule).

To get started with Treetop, you need to install it first. Either get the Treetop source from the Rubyforge project, or install it as a gem:

gem install treetop 

To start using it, check the Treetop documentation or look at the provided examples. Treetop ships with a simple parser for arithmetic expressions and a very basic language parser and runtime.

Treetop can turn a grammar definition file into Ruby code with the tt utility:

tt foo.treetop  

Another option is to do the parser generation from Ruby code:

Treetop.load "arithmetic"
parser = ArithmeticParser.new
parser.parse('1+1') 


For a live demo of the Treetop by it's creator, see Nathan Sobo's RubyConf 2007 presentation of Treetop

Documentation by Mickael Faivre-Macon Posted
  1. Back to top

    Documentation

    by Mickael Faivre-Macon

    The documentation is terse, and lack many example for use in ruby code. Does someone know if the project has a mailing list of a forum ?

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.