InfoQ

InfoQ

Article

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Using singleton classes for object metadata

Posted by Werner Schuster on Dec 05, 2007

Sections
Process & Practices,
Architecture & Design,
Development
Topics
Code Analysis ,
Language ,
Ruby ,
Dynamic Languages ,
Object Oriented Design
Tags
OOP

So you have a bunch of objects - an object graph - the result of some operation or API call. The task: analyze the data and store the results of the analysis as metadata for the graph.

Too abstract? Think of how a compiler works: the parser spits out a parse tree (or Abstract Syntax Tree or AST) that represents the code as tree structure. Then: a lot of algorithms run over the tree in passes: gathering symbols in the symbol table, doing type inference, using these types to do type checking, etc.

But wait: the last two passes show the problem: where does the type inferencing code store its data for the type checker to use? It'd be most convenient to store the data where it's needed - i.e. if an expression node (in the AST) is found to return type Foo, then it'd be best to store that information right in the node.

To illustrate the solution, we'll look at some tools that work on ASTs - not a compiler, but tools with similar requirements. In Ruby, the ParseTree library returns Ruby source code as an AST. Example:

[:vcall, :obj, :say_hello, [:array, [:lit, 42], [:lvar, :foo]]] 

This is a bit of Ruby code represented as ParseTree AST. Since this is an article and not a VM, it's a bit difficult to work with objects and object references organized as a graph - so we use ParseTree's s-expr representation of the tree. S-exprs are nested lists; each list represents a node in the tree, with the first element specifying the node type. In the sample, the node represents a call to a virtual method (vcall). The parameters are the receiver (the 'self' in the called method), the name of the method and the parameters.

Ideas for tools would be a type checker, static analysis tools or automatic refactorings. One of the requirements for tools of that kind to work is some kind of type inference, i.e. determining the types of variables or expressions. For instance, this AST represents a call to the to_s method:

[:vcall, :obj, :to_s] 

What information can be gathered about this, and what could be done with this information? The type of the return value, for instance. This is hard to determine - but since it's to_s, a method that returns the string representation of an object - let's just say it returns "String". This is not necessarily true - it's a guess. For tools such as a code completion, it's fine enough - 100 % accuracy isn't possible for these matters. Although in some cases, more information could be gathered, and the analyzer could determine more accurate information.

Analyzers work on the tree and annotate the AST with the metadata they have determined. For keeping codebases modular, it's a good idea to separate the analyzers from the metadata consumers, i.e. the code that walks the AST and does something with the metadata. For instance, a Ruby editor could highlight a method that overrides another in its superclass.

The solution

Long story short: here's a solution for annotating ParseTree nodes:

node = [:vcall, :obj, :to_s] 

def node.set_metadata(key, value)
 @_metadata ||= Hash.new
 @_metadata[key] = value
end
def node.metadata
@_metadata ||= {}
end

node.set_metadata(:type, :String)

What does that do?

The secret sauce: singleton classes

Every object in Ruby is the instance of a class. Unlike many other OOP languages, Ruby allows you to change an object's class. Don't confuse this with open classes: in Ruby, it's possible to modify a class - even at runtime. Singleton classes are similar - except that they only affect one object's class. While the difference might seem small - it has the big benefit of limiting the effects of the class modification to the affected object. Open classes, on the other hand, change the class definition for all the code and all the objects of these classes. This is useful, but can also mess with other people's code and if too many pieces of code do it on common classes, name clashes with existing methods can happen. In the case of ParseTree nodes - which are ordinary Ruby Arrays - this means that only the actually used Array objects are affected - not all the Arrays on the heap.

Another way to see this: with singleton classes, the changes to the class stay local to the code that creates and uses them - they never need to be visible outside. Open classes, on the other hand, are global changes: a class name is a global variable, i.e. "String" refers to the class object representing Strings. Just as more scoped variables (local variables, member variables) are preferable over global variables, so are singleton classes to open classes.

Of course: which solution to choose (singleton or open classes) depends on the particular situation. With open classes, the change to the class happens once - after it, all objects of that class have the added methods. With singleton classes, the change to the class (i.e. creating the singleton class and changing the object's class pointer to point to it) happens at every change.

The syntax - as seen above - is simple:

def object_variable.method_name()
# code
end

The variable pointing to the object is prefixed to the method name. A more flexible and modular way is to use Mixins to do it. This allows to collect methods that handle some aspect in a Module and mix them in in one go. E.g.

module Metadata 
def set_metadata(key, val)
@_metadata[key] = val
end
def metadata
@_metadata ||= {}
end
end
x = [:vcall, :obj, :to_s]
x.extend Metadata
x.set_metadata(:type, :String)

Mixins allow to mix functions defined in a Module into a class - in this sample, it's the node object's singleton class. Again: only this object will have these methods now.

Another example - dead code removal

If the static analyzer code is too abstract, let's try another tool: a dead code remover. "Dead code" is simply code that doesn't really do anything and can be removed without changing the behavior of the program. Like this code:

for x in [1,2,3] do
end

With our annotating concept, we can look for this kind of code and annotate it as such:

node.metadata(:dead_code, :true) 

But marking the code as dead is only the first step - now we need to remove it. This is where it's easy to see that it makes sense to split up analysis and action part. An IDE might just want to highlight some code as dead code, but it might also offer a simple way (a Quick Fix) that removes the code.

How can this code be removed? Sure - it'd be possible to dump the node from the AST and then use Ruby2Ruby to spit out Ruby source. But that's not a very nice solution: Ruby2Ruby turns ParseTree ASTs into Ruby source code, but it loses a lot of useful information, such as formatting (white space) and comments. Dropping these is not acceptable in an IDE or refactoring/cleanup tools.

Metadata to the rescue: the nodes can be annotated with source locations - ie. where this particular node is found in the original source code. And with that - it's easy: the cleanup code simply marches through the code, finds all nodes marked with :dead_code and removes them in the source file, using the character offsets of the source location metadata. (Of course - once some nodes were removed, the offsets of the rest of the nodes are off. This can be solved by simply tracking the number of deleted characters and subtracting them from the actual offsets. With that, it's not necessary to reparse and re-analyze the source).

Conclusion

The examples in this article were focussed on language tools - but the ideas are usable for all kinds of object collections. Wherever object graphs need to be annotated, possibly in multiple passes by independently written analyzers, it's convenient to store the annotations with the nodes. If the classes of the object graph are not under the developer's control and are extensively used elsewhere (e.g. the Array class in ParseTree), singleton classes are a good choice. For other situations, open classes might be a better solution.

As for implementing the specific tools mentioned here: ParseTree is available for Ruby 1.8.x, in Rubinius and JRuby (jparsetree). The JRuby version also adds source locations for the individual nodes, so it's even easier to modify source. Have fun coming up with ideas for tools and writing them - it's easy with Ruby.

No comments

Watch Thread Reply

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.