InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

ruby_parser 1.0: a Ruby Parser written in Ruby

Posted by Werner Schuster on Dec 28, 2007

Sections
Development
Topics
Compilers ,
Runtimes ,
Ruby ,
Language
Tags
IDEs ,
ParseTree ,
Rubinius
Ryan Davis announced ruby_parser, a parser for Ruby source code written in Ruby. The parser was written using Ruby yACC (RACC), a parser generator that is bundled with the Ruby standard library:
ruby_parser (RP) is a ruby parser written in pure ruby (utilizing racc--which does by default use a C extension). RP's output is the same as ParseTree's output: s-expressions using ruby's arrays and base types.
The library is easy enough to use:
RubyParser.new.parse "1+1" 
which returns
s(:call, s(:lit, 1), :+, s(:array, s(:lit, 1))) 
A Ruby parser, written in pure Ruby has long been missing in the Ruby world. Just to clarify the term "Pure Ruby" in this context: this means the parser's code
  • consists solely of Ruby source files
  • does not add any native extensions or other C code (eg. with RubyInline) which requires a C compiler to be present on the user's system
These properties are crucial to ensure that the code runs across all Ruby runtimes. An implementation that would require a C-based native extension, would be unusable on Ruby versions that don't support them, e.g. JRuby, XRuby, or the .NET based runtimes IronRuby or Ruby.NET. Even if they supported native extensions (something under consideration for JRuby), native extensions cause deployment problems, because a shared library or DLL of the extension would have to be shipped for any conceivable OS/CPU combination (otherwise some users would not be able to use it). RubyInline, also a project by Ryan Davis, helps with this a bit by automatically compiling the inline C code, but this still requires a C compiler to be present on the target system - something that's not guaranteed, particularly on Windows systems.

A lack of a pure Ruby parser has been negligible for some time in Ruby's history, since getting the Abstract Syntax Tree (AST) of some Ruby code was possible with utilities such as ParseTree. However, ever since the explosion of alternative Ruby runtimes, Ruby parsers have been reimplemented several times - twice in Java (JRuby and XRuby), once in C# (Ruby.NET wrote the parser also used by IronRuby). All of these provide different ASTs and different ways of getting at it.
This has caused some issues for Ruby source tools. E.g. the Ruby Refactoring tools, now part of the Eclipse-based Aptana/RDT, are tied to both Java and the JRuby AST and not usable from other Ruby implementations. With similar tools now being (re)written for other Java based Ruby IDEs, this means that a large amount of code quality and code manipulation tools are now locked into Java and JRuby. Not just that: the logic of these tools is written in Java and not Ruby, which makes them less approachable for Ruby developers.

The pure Ruby parser now offers a chance to change that - a Ruby IDE or other tool, can now get at a Ruby AST without getting locked in. E.g. a Java based IDE can keep a JRuby instance around and run ruby_parser in it. To that end, the current version still needs to add correct source locations to the generated output - i.e. every AST node needs to know the offset where it's source code representation starts and ends. This is crucial for source tools - pure structural information is useful, but if the tool doesn't know where a node is, it can't modify it in the file.

Another client for ruby_parser is Rubinius, a Ruby VM written in (mostly) Ruby, which takes it's Ruby parser from MRI. Using ruby_parser will allow it to remove yet another piece of C code. To avoid the chicken-and-egg question "How can a Ruby VM work if it's parser is Ruby code?": during the build process of the Rubinius VM, the Ruby source code of ruby_parser can be compiled into Rubinius bytecode. When Rubinius starts, it loads the ruby_parser bytecode file - something that doesn't need the parser - and now has a running Ruby parser.

There's still a lot of work left for ruby_parser, as can be seen by a few of the issues mentioned in the release notes:
  • Known Issue: Speed sucks currently. 5500 tests currently run in 21 min.
  • Known Issue: Code is waaay ugly. Port of a port. Not my fault. Will fix RSN.
  • Known Issue: I don't currently support newline nodes.
  • Known Issue: Totally awesome.
  • Known Issue: dasgn_curr decls can be out of order from ParseTree's.
  • TODO: Add comment nodes.

No comments

Watch Thread Reply

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.