InfoQ

News

ruby_parser 1.0: a Ruby Parser written in Ruby

Posted by Werner Schuster on Dec 28, 2007 05:00 AM

Community
Ruby
Topics
Language ,
Compilers ,
Runtimes
Tags
Rubinius ,
ParseTree ,
IDEs
Ryan Davis announced ruby_parser, a parser for Ruby source code written in Ruby. The parser was written using Ruby yACC (RACC), a parser generator that is bundled with the Ruby standard library:
ruby_parser (RP) is a ruby parser written in pure ruby (utilizing racc--which does by default use a C extension). RP's output is the same as ParseTree's output: s-expressions using ruby's arrays and base types.
The library is easy enough to use:
RubyParser.new.parse "1+1" 
which returns
s(:call, s(:lit, 1), :+, s(:array, s(:lit, 1))) 
A Ruby parser, written in pure Ruby has long been missing in the Ruby world. Just to clarify the term "Pure Ruby" in this context: this means the parser's code
  • consists solely of Ruby source files
  • does not add any native extensions or other C code (eg. with RubyInline) which requires a C compiler to be present on the user's system
These properties are crucial to ensure that the code runs across all Ruby runtimes. An implementation that would require a C-based native extension, would be unusable on Ruby versions that don't support them, e.g. JRuby, XRuby, or the .NET based runtimes IronRuby or Ruby.NET. Even if they supported native extensions (something under consideration for JRuby), native extensions cause deployment problems, because a shared library or DLL of the extension would have to be shipped for any conceivable OS/CPU combination (otherwise some users would not be able to use it). RubyInline, also a project by Ryan Davis, helps with this a bit by automatically compiling the inline C code, but this still requires a C compiler to be present on the target system - something that's not guaranteed, particularly on Windows systems.

A lack of a pure Ruby parser has been negligible for some time in Ruby's history, since getting the Abstract Syntax Tree (AST) of some Ruby code was possible with utilities such as ParseTree. However, ever since the explosion of alternative Ruby runtimes, Ruby parsers have been reimplemented several times - twice in Java (JRuby and XRuby), once in C# (Ruby.NET wrote the parser also used by IronRuby). All of these provide different ASTs and different ways of getting at it.
This has caused some issues for Ruby source tools. E.g. the Ruby Refactoring tools, now part of the Eclipse-based Aptana/RDT, are tied to both Java and the JRuby AST and not usable from other Ruby implementations. With similar tools now being (re)written for other Java based Ruby IDEs, this means that a large amount of code quality and code manipulation tools are now locked into Java and JRuby. Not just that: the logic of these tools is written in Java and not Ruby, which makes them less approachable for Ruby developers.

The pure Ruby parser now offers a chance to change that - a Ruby IDE or other tool, can now get at a Ruby AST without getting locked in. E.g. a Java based IDE can keep a JRuby instance around and run ruby_parser in it. To that end, the current version still needs to add correct source locations to the generated output - i.e. every AST node needs to know the offset where it's source code representation starts and ends. This is crucial for source tools - pure structural information is useful, but if the tool doesn't know where a node is, it can't modify it in the file.

Another client for ruby_parser is Rubinius, a Ruby VM written in (mostly) Ruby, which takes it's Ruby parser from MRI. Using ruby_parser will allow it to remove yet another piece of C code. To avoid the chicken-and-egg question "How can a Ruby VM work if it's parser is Ruby code?": during the build process of the Rubinius VM, the Ruby source code of ruby_parser can be compiled into Rubinius bytecode. When Rubinius starts, it loads the ruby_parser bytecode file - something that doesn't need the parser - and now has a running Ruby parser.

There's still a lot of work left for ruby_parser, as can be seen by a few of the issues mentioned in the release notes:
  • Known Issue: Speed sucks currently. 5500 tests currently run in 21 min.
  • Known Issue: Code is waaay ugly. Port of a port. Not my fault. Will fix RSN.
  • Known Issue: I don't currently support newline nodes.
  • Known Issue: Totally awesome.
  • Known Issue: dasgn_curr decls can be out of order from ParseTree's.
  • TODO: Add comment nodes.

No comments

Watch Thread Reply

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.