InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Google SoC Series: ANTLR v3 Ruby Parser

Posted by Werner Schuster on Apr 27, 2007

Sections
Process & Practices,
Architecture & Design,
Development
Topics
Ruby ,
Code Analysis ,
Dynamic Languages
Tags
XRuby ,
Google Summer of Code ,
Interviews
XRuby  is an effort to create a Ruby to Java bytecode compiler.  A compiler, of course, needs a way to parse the input language, and so the XRuby team created their own Ruby parser using the popular  ANTLR parser generator. Parser generators take the grammar of a language and generate code to parse it. Using  ANTLR meant that the grammar and parser had to be created from scratch, which is different from JRuby's approach. Ruby's parser uses another parser generator called YACC, and JRuby chose to reuse this grammar and use a Java port of Yacc to generate it's parser.

When asked about whether Ruby is a difficult language to parse, Wang Haofei, who works on this Google SoC project and the XRuby team, explains:
Yes it is. There are lots of ambiguities in the language. For example, "<<" can be either left shift operator or start of heredoc. To distinguish these two the lexer has to maintain state (context dependent): http://seclib.blogspot.com/2005/11/distinguish-leftshift-and-heredoc.html

Others likes ID/function ambiguity, expression substitution inside string, heredoc etc are all very difficult to deal with.
When it comes to a difficult task like this, it's always good to ask just how far along it  really is. Wang Huofei:
Since the first public release it can parse the entire ruby standard libraries and and Ruby on Rails (have not try the latest one recently): http://seclib.blogspot.com/2006/02/first-release-of-rubyfront.html

Xue has fixed a few bugs since then, but generally it is very stable. We will write and run more tests during the SOC project and it may help us to uncover some unknown issues.
Xue Yong Zhi is the mentor for this Google SoC project and also works on XRuby.

One big part of the Google SoC project seems to be porting the existing parser to ANTLR's new version 3. Wang Huofei:
1. ANTLR v3 is a rewrite of v2 and has significantly enhanced parsing strength via LL(*) parsing, v2 is much weaker (limited LL(k)) and it forced me to add some hacks to workaround some problems. While ANTLR based parser is easier to maintain than others, migration to v3 will help us to make it better and cleaner.
2. There will be a ruby backend for ANTLR v3, so that we may be able to have ruby parser in ruby.
3. ANTLR v3 's performance is much better.
The second item in this list brings up an a very interesting point. Ruby lacks a Ruby parser written in Ruby. This is an issue for writing tools that handle Ruby code. Code analyzers, refactoring tools and automated refactorings, formatters and more, are difficult, if not impossible, to write in Ruby, because there is no way to parse Ruby source with Ruby code. There are workarounds, like Ryan Davis' ParseTree which uses the parser of Ruby's interpreter (via a native extension) to get at the Abstract Syntax Tree (AST) for Ruby source. An AST is a tree representation of source code, and is necessary for tools that need to know about the structure of the code. Yet, even ParseTree is not a complete solution, since it's current versions don't give the source locations of individual nodes. Obviously, a refactoring algorithm that, for instance, wants to rename an identifier in a Ruby source file needs to know where the identifier actually is.

This issue has become ever more obvious in the past year, due to the arrival of various Ruby IDEs. These ship with code analyzers (to warn about potential errors in the code) and the Eclipse based RDT is the first IDE to feature extensive refactoring support for Ruby. Other features are support for popular Ruby based files, such as Rake files, Ruby's equivalent of make or Ant. The problem: these tools are all built with Java (or other languages) and Java IDEs all use JRuby's parser.

This means that the functionality of these tools is locked in those languages, and even worse, often tied to particular IDEs. For instance, the logic for RDT's refactoring support is not available for, say, Ruby in Steel, an IDE built on Visual Studio. This is different than, for instance,  in the Java space, where parsers are available.  Tools such as PMD or Findbugs are naturally  written in Java and thus available wherever Java runs and, more importantly, can be extended with Java code.

Because the Google SoC description of this project isn't 100% clear on the Ruby based parser, Wang Huofei clarifies the project plan:
It depends on how well we are doing. We are willing to do that even if it may not fit in SOC schedule.
Good news.

One necessity for making code tools is the AST, to analyze source. ParseTree, as already mentioned, offers a format to represent Ruby source code. Tools, based on ParseTree, already exist, such as Ruby2Ruby which can turn an AST back into Ruby source code; useful if a tool wants to modify an AST and then output it as Ruby source. Rubinius, a project to implement a Ruby VM in Ruby, also uses ParseTree output to compile Ruby to the Rubinius bytecode, which is then interpreted. When asked about the output of the parser, Wang Haofei explains:
ANTLR has its own builtin AST support, and it is very handy if you want to serialize to a string or change to other struture. It looks similar to  ParseTree's output. In XRuby we turned the AST into a DOM-like structure and use visitor pattern to generate java bytecode.
While ParseTree output doesn't seem to be planned, it's entirely possible to translate the ANTLR generated AST into the ParseTree format. A similar approach is already used by  JParseTree, a port of ParseTree that works on JRuby, now a part of JRuby Extras which provides JRuby ports of popular Ruby libraries.

One way to watch the progress of XRuby and it's parser project is the XRuby team blog.

Related Sponsor

In today’s hyper-competitive world, later may be too late to adopt Agile development and this Roadmap for Success will help you get started. Download "Agile Development: A Manager's Roadmap for Success" now!

Full Ruby IDE by Sebastien Auvray Posted
Re: Full Ruby IDE by Werner Schuster Posted
Re: Full Ruby IDE by zqu dlyba Posted
  1. Back to top

    Full Ruby IDE

    by Sebastien Auvray

    This step to ANTLR v3 parsing is promising!
    On a side note it's not the only stopper for a full-ruby IDE, do you know any robust crossplatform Rich Client framework in Ruby (or easily pluggable with it) that can compete with the ones in Java or limited-platform natives C guis ? Tk & friends are a joke.

  2. Back to top

    Re: Full Ruby IDE

    by Werner Schuster

    Well, there are lots of GUI toolkits, but I haven't used any of them. The only GUI Ruby code I wrote was in JRuby using SWT.
    I tried using FreeRIDE freeride.rubyforge.org/wiki/wiki.pl but I'm not sure how alive it is. The problem is, of course, that writing an IDE from scratch is a _big_ task, and that's just the start, because the IDE is only really valuable when an ecosystem develops around it. Eclipse has that (and with RDT/RadRails has great Ruby support), I guess Visual Studio has that too.
    Frankly, I'd prefer using Eclipse RDT, but I'd like to have things like a Ruby Lint or Refactoring or other tools written in Ruby. This is possible now (the code can be run with JRuby) but this code would rely on the JRuby AST, which, naturally, isn't available anywhere else.
    I wrote about the need for a standard AST in Ruby in more detail here:
    jroller.com/page/murphee?entry=ruby_let_s_get_an

  3. Back to top

    Re: Full Ruby IDE

    by zqu dlyba

    wxRuby

Educational Content

Jesper Boeg on Priming Kanban

In this interview, Jesper Boeg, author of the new InfoQ book – Priming Kanban, discusses the keys to using Kanban effectively, and how to get started if you are currently using other approaches.

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.