InfoQ Homepage Articles DSL Interaction with Program Transformation in TXL

# DSL Interaction with Program Transformation in TXL

## Introduction

The number of computer languages known currently come to hundreds growing to thousands if we add other types of languages including Domain Specific Languages (DSLs). Having several different software systems within an organization's infrastructure, an integration problem arises when these systems need to interact with each other using different languages. They can interact using something other than XML and with various syntaxes for input and output. Working with such DSLs involves resolving the issues of the source text parsing and the tasks of fetching the required information and its further processing according to the project goals. This is where program transformation techniques come to the rescue. This technique is widely used in software engineering to solve different tasks such as program translation from one language to another (e.g. C# to Java), migration between the language versions (e.g. Python 2.5 to Python 3), source code analysis (build call graph, etc.), refactoring, documentation generation (something like javadoc) and other tasks when one needs to produce specific data based on the source code.

This article discusses a proposed solution for solving the interaction problem between two DSLs. This solution was developed based on the program transformation approach using the TXL programming language. TXL is a programming language specifically designed to support computer software analysis and source transformation tasks. It allows the developers to concentrate on data analysis and information processing aspects by making the other tasks like source text parsing and results formatting almost effortless.

The use case discussed in the article is the implementation of a yUML service. It can be used to draw UML diagrams from their textual descriptions (let's call this solution TextUML for references in the article). Let's consider an example of generating a UML diagram for three classes, namely Shape, Circle and Rectangle, from a textual description shown in Listing 1.

Listing 1: Textual description showing the relationship between Shape, Circle and Rectangle

[Shape]^-[Circle] [Rectangle]-^[Shape]

The class diagram shown in Figure 1 below is generated from the above model description.

Figure 1. Class diagram showing the Image generation sample

To depict a diagram visually, some dedicated tools are needed to place rectangles and arrows between them. Graph visualization package graphviz is a good candidate for this purpose. The following statement on their website describes how Graphviz helps the developers with this technique. “The Graphviz layout programs take descriptions of graphs in a simple text language, and make diagrams in several useful formats” (including PNG which is used in this article). For image rendering a dot program from the package will be used. The diagram similar to the one shown in Figure 1 can be built from the file sample.dot by launching the command: dot -Tpng sample.dot > sample.png. The sample.dot is shown in Listing 2.

Listing 2: sample.dot with the textual description of the class diagram

digraph G {
node [ shape = "record" ]
edge [ arrowhead = "empty" ]
Circle [ label = "{Circle}" ]
Shape [ label = "{Shape}" ]
Rectangle [ label = "{Rectangle}" ]
Circle -> Shape
Rectangle -> Shape
}

The resulting diagram is shown in Figure 2 below.

Figure 2. dot command result

In general, the task of translating from TextUML source text into dot language is not trivial starting from parsers development (using special tools like lex/yacc, ANTLR and so on or manually) and finishing with transforming the received data for further processing. Let's see how this can be solved with TXL.

First, we need to parse the input text in TextUML, gather useful facts (information about classes and associations between them) and generate from those facts a program in dot language. In the scope of the article let's bound the TextUML language to have only the inheritance association and simple attributes and methods declaration. And to keep the sample application simple for the demo purposes, we will provide the least validity of generated images not taking into account their visual details. TextUML allows the following expressions:

 [Shape | x; y | draw() ]^-[Box] Box inherits Shape, has attributes x and y, method draw() [Triangle]-^[Shape] Triangle inherits Shape, for a class declaration the first match of a class description used, one doesn't need to mention attributes and methods for the second time [Shape]^[Circle | r | ] Сircle inherits Shape and Circle has attribute r

Figure 3 below describes the transformation sequence in TXL.

Figure 3. Transformation sequence

For parsing we will define the input language TextUML grammar in Backus-Naur Form (BNF) which is a metasyntax used to describe formal languages (including programming languages and DSLs). This is shown in Listing 3 below.

Listing 3. TextUML grammar in BNF form

<uml_program>                   ::= {<class_relation>}
<class_relation>		::= <class_definition> <inheritance_association>
<class_definition>              ::= '[' <id> [<class_attributes_methods>] ']'
<class_definition>		::= '[' <id> [<class_attributes_methods>] ']'
<class_attributes_methods>	::= <class_attributes> <class_methods>
<class_attributes>		::= '|' [<id> {; <id>}]
<class_methods>			::= '|' [<id>() {; <id>()}]
<inheritance_association>	::= ^ | ^- | -^
<id>				::= {A-Z | a-z | 0-9 | _ }+ 

And from the full dot language grammar, outline the subset which is enough for our UML class diagram inheritance relation displaying task. Listing 4 below shows this script.

Listing 4. UML Class Diagram Inheritance Relationship

<dot_program>      ::= digraph <id> '{' {<stmt_list>} '}'
<stmt_list>        ::= <node_stmt> | <edge_stmt> | <attr_stmt> | {<attr_assignment>}
<attr_stmt>        ::= (node | edge) <attr_list>
<node_stmt>        ::= <id> <attr_list>
<attr_list>        ::= '[' {<attr_assignment>} ']'
<attr_assignment>  ::= <id> '=' <literal>
<edge_stmt>        ::= <id> '->' <id>
<literal>          ::= <string> | <number>
<id>               ::= {A-Z | a-z | 0-9 | _ }+
<string>           - double quoted string "any q2"
<number>           - unsigned integer or real number — 3, 435.23 

## Transformation problem

### Transformation maps

To design the future transformation and make translation rules creation easier, let's use correspondence maps to define the way how translations will be made. The following table shows sample TextUML to dot transformations.

 TextUML dot [?1]-^[C2] digraph G { node [ shape = "record" ] edge [ arrowhead = "empty" ] C1 [ label = "{C1}" ] C2[ label = "{C2}" ] C1 -> C2 } [Shape | x;y | draw()]^[Circle] digraph G { node [ shape = "record" ] edge [ arrowhead = "empty" ] Circle [ label = "{Circle}" ] Shape [ label = "{Shape| x\ly | draw()}" ] Circle -> Shape }

As it is seen from the above table the information about identifiers, attributes and methods of classes participating in an inheritance is extracted and then the corresponding graph description is generated. Bold font depicts extracted facts and their place in input and output texts.

### DSL grammars

For the transformation development in TXL language for our task, we first define the grammar for both TextUML and dot languages. TXL grammar definition resembles BNF and the following table shows correspondence between BNF and TXL grammars definitions.

 ::= {} define uml_program      [repeat class_relation] end define ::= define class_relation      [class_definition] [association] [class_definition] [NL] end define ::= '[' [] ']' define class_definition     '[ [id] [opt class_attributes_methods] '] end define

The TextUML (uml.Grm) and dot (dot.Grm) grammar files are shown in Listings 5 and 6 below.

Listing 5. TextUML grammar file

#pragma -esc \
compounds      -^ ^- end compoundsdefine uml_program     [repeat class_relation] end definedefine class_relation     [class_definition] [association] [class_definition] [NL] end definedefine class_definition     '[ [id] [opt class_attributes_methods] '] end definedefine class_attributes_methods     [class_attributes] [class_methods] end definedefine class_attributes     '| [repeat class_attribute]end definedefine class_attribute     [id] [attr ';] end definedefine class_methods     '| [repeat class_method] end definedefine class_method     [SPOFF] [id] '( ') [SPON] [attr ';] end definedefine association     [inheritance_association] end definedefine inheritance_association     '^ | '^- | '-^ end define

Listing 6. dot grammar file

keys
'digraph 'node 'edge
end keys compounds      '-> end compoundsdefine program      [dot_program] end redefinedefine dot_program      'digraph [id] '{ [NL] [IN] [repeat stmt_list] [NL] [EX] '} end definedefine stmt_list      [attrib_stmt] | [repeat attrib_assignment] | [node_stmt] | [edge_stmt] end definedefine node_stmt      [id] [attrib_list] end definedefine attrib_stmt      attrib_head] [attrib_list] end definedefine attrib_head      'node | 'edge end definedefine attrib_list      '[ [NL] [IN] [repeat attrib_assignment]'] [NL] [EX] end definedefine attrib_assignment      [id] '= [literal] [NL] end definedefine literal      [stringlit] | [number] end definedefine edge_stmt      [id] '-> [id] [NL] end define

### Transformation workflow

Transformation consists of the following steps:

1. parse the UML diagram template program in dot language
2. parse the TextUML input text
3. prepare TextUML text for further processing
4. extract useful information about classes and their relations from TextUML text
5. add to the dot language program UML classes nodes declarations and inheritance relations between them
6. generate resulting program text in dot language

Input text is parsed by TXL automatically and user interested in transformation process gets the parse trees for further processing. For example Figure 4 below shows the parse trees for the first example from the above correspondence map which also shows how the information about a class identifier is transferred between trees. Program text is also generated automatically by TXL so our task is to implement steps from 3 to 5.

Figure 4. Parse trees and node orrespondence

Let's see how text preparation (step 3) works. It includes the following actions:

• remove relations in which a class inherits itself
• transform inheritance relations noted with -^, ^- to ^ (e.g. [C1]-^[C2] becomes [C2]^[C1]) to simplify the upcoming jobs

### A bit of TXL explained

The full transformation program text can be found in the sample application archive provided with the article (listed in the References section). Let's cover some particular parts to understand the whole job. We won't discuss the TXL syntax in detail but will look at some general ideas of its work.

In general, functions and rules in TXL define some parse tree replacement with another tree. The word replace defines the type and the pattern of the replaced tree and word by defines the tree with which the original tree will be replaced.

To remove self-inheritance constructions let's take the original list of [class_relations] constructions and build a new list enrolling there only those constructions which have no inheritance relations with the same class identifier. Subsequently for each construction [class_relation] removeSelfInheritance function is called. The function deconstructs the tree [class_relation] into parts and compares class identifiers. If classes are not equal the construction is added to the new list. Here is the simpler function lrInheritanceToDefault which replaces -^ relation with ^.

Listing 7: Function to replace the class relationship type

rule lrInheritanceToDefault
replace [class_relation]
C1 [class_definition] '-^ C2 [class_definition]
by
C2 '^ C1
end rule

In this function it is defined that the rule is applied to the [class_relation] node and the tree pattern to which the rule will be applied is [class_definition] '-^ [class_definition], i.e. the inheritance construction we are looking for.

The line C1 [class_definition] '-^ C2 [class_definition] defines assignments of left and right [class_definition] constructions to variables C1 and C2 accordingly. Then the replacement pattern C2 ^ C1 defines the new view of [class_relation] construction with ^ inheritance operation and classes declaration changed their places with each other according to the new operation syntax. The expression [Circle]-^[Shape] after applying the lrInheritanceToDefault function becomes [Shape]^[Circle].

### Transformation sources

dot.Txl file shows the full transformation code. This source code for this file can be found in the sample application zip file provided with this article. Now let's prepare the dot language template (graph0.dot file) for diagram generation which is shown in Listing 8 below.

Listing 8: dot language template for diagram generation

digraph G {
rankdir = "BT"
node [ shape = "record" ]
edge [ arrowhead = "empty" ]
} 

Later nodes (classes) and arrows between them (inheritance) will be added to this text. The text from graph0.dot is referred as a [dot_program] parse tree defined in dot.Grm, and TextUML program text as a [uml_program] from uml.Grm accordingly.

The command to launch the transformation is “txl graph0.dot -o out.dot - -umlfile text.uml” where graph0.dot is the template shown above, out.dot is the resulting dot language text to generate the diagram image, and text.uml is the diagram text in TextUML language.

### Transformation Details

Now let's see how the whole transformation works. Step 1 is made automatically and [program] tree for graph0.dot comes as an input to the main function which transfers the processing to doTransform function. The latter implements step 2 by parsing the file supplied with command line argument -umlfile. Step 3. The parse tree [uml_program] from the previous step is deconstructed into its parts giving a list of [class_relation]:

deconstruct UmlP
CRO [repeat class_relation] 

Then the described earlier removeselfinheritance, rlInheritanceToDefault and lrInheritanceToDefault operations are applied:

construct CR [repeat class_relation]
_ [removeSelfInheritance each CRO] [rlInheritanceToDefault] [lrInheritanceToDefault] 

DefinedClasses variable will hold the list of already declared classes to avoid redifining the classes.

After that the steps 4 and 5 are implemented with functions addClasses and addEdges which add classes and edges between them to the output parse tree [dot_program]. From this tree later TXL will generate the out.dot file.

Let's see in more detail how addEdges function works. It is called subsequently for each class relation from text.uml file (note the function invoked with the each key in doTransform). This is shown in Listing 9.

function addEdges CR [class_relation]
replace * [repeat stmt_list]
Stms [repeat stmt_list]
deconstruct CR
'[ C1 [id] _ [opt class_attributes_methods] '] '^ '[ C2 [id] _ [opt class_attributes_methods] ']
construct E [stmt_list]
C1 '-> C2
by
Stms [. E]
end function 

The function is applied to the first found (symbol * means find the first occurrence of the pattern) [repeat stmt_list] subtree found in [dot_program]. The tree [class_relation] supplied as a parameter CR is broken into parts:

deconstruct CR
'[ C1 [id] _ [opt class_attributes_methods] '] '^ '[ C2 [id] _ [opt class_attributes_methods] ']

where variables C1 and С2 get predecessor and ancestor class identifiers at once respectively.

After that a new [stmt_list] tree is made. In this case it is the [edge_stmt] element which is recognized by TXL automatically:

construct E [stmt_list]
C1 '-> C2 

This results E variable to have the tree shown in Figure 5 in it.

Figure 5. Е variable contents

Step 6 is performed automatically after the user tree processing operations are finished. To see the diagram image from the generated out.dot file, use the following command: “dot -Tpng out.dot > out.png” and open the resulting out.png file.

To launch transformations you will need to download the TXL distribution for your platform and install it. Create a separate directory called dsltransform for launching the operations from the article and create a subdirectory called Txl in it. Put grammar and transformation files dot.Grm, uml.Grm, dot.Txl in Txl subdirectory. Also install graphviz package for your platform (also available from a system package manager for Linux/Unix systems). Configure your environment variables to allow txl (txl.exe) and dot (dot.exe) commands to be launched from the command line or use the full path to them during invocation. All jobs have to be performed inside dsltransform directory.

## Conclusions

The use case discussed in the article is one of the several approaches for DSL processing which allows to simplify the task of useful information extraction from the text and its further processing. This approach might be difficult to apply if a DSL cannot be described with a context-free grammar. But such situations appears not so often in a real life project and possibly some changes in the DSL syntax might solve the issue. In other cases this solution might significantly simplify the work with text, excluding the need to develop parsers, because parsing job is made automatically by TXL. It can also work as a syntax checker for the input DSL showing the places where syntax is incorrect. For example you can try to feed a wrong TextUML text to TXL and see it shows the input error.

Also it's worth to mention again that program transformation has more applications. The problem discussed in the article is a general application of tools similar to TXL to translate a program from one language to another. It might happen when one works with legacy software (there are proven industrial projects made by TXL converting millions line of code), migrate between language versions, creating bindings to a C library for some scripting language (something similar to SWIG), etc.

Another interesting application of program transformation is the code analysis. You think your source code as a data source and make query to find something about your code. These code analysis questions can be something like: Is BankAccount::withdraw called only from Customer class, are there any returns inside the if statement, which method has variables defined in the middle of its body and not at the beginning? You can find answers to similar questions using TXL. For example it was used for Y2K analysis and conversions of over 3.5 billion lines at IBM Global Services.

TXL is not the only language providing similar functionality. There are other tools available, free and commercial, such as Stratego/XT, ANTLR, and DMSToolkit. Also XSLT uses similar approaches.

Mykhaylo Sorochan holds a Ph.D. degree in computer aided design, currently works as a Ruby developer at Sphere Consulting, Inc. and gives lectures at International Solomon University on software development. In his spare time, he likes to discover new programming languages, approaches and paradigms about development process.

Style

## Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

• ##### Nice.

by Luis Espinal /

• ##### Great one

by slava pocheptsov /

• ##### Nice.

by Luis Espinal /

Your message is awaiting moderation. Thank you for participating in the discussion.

This could be a step in the right direction. I've been thinking for the longest time that dot (or a dot-like language) would be an excellent way to define meta-languages, or interface definition languages.

• ##### Great one

Your message is awaiting moderation. Thank you for participating in the discussion.

Thank you for the article, it give some way to visualize structured data in a simple and well known for developer dot manner.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.