DSL Evolution

Bookmarks

Dec 22, 2009 11 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Introduction

A domain-specific language (DSL) is a programming language, rather than a general purpose language, targeted to a particular problem domain. DSLs are a great way of creating DRYer code that is "business user readable". Over the last few years there have been a lot of articles published in the general programming world about Domain Specific Languages (DSLs).

Creating a domain specific language isn't hard. However, our understanding of the domain always continues to evolve, so for a DSL to be useful over the long run, we need to have a strategy for evolving our DSLs. If you are undertaking a large project or developing a software product line (SPL) where you will be using your DSLs over a substantial period of time, you better have an idea of how you're going to handle the evolution of your DSLs.

From backwards compatibility through versioning, to automated transformation of statements, this article will look at approaches to structuring your domain specific languages in a way that will simplify the process of evolving them over time.

Avoiding the Problem

In the 3GL world, language designers are very aware of the importance of backwards compatibility. Whatever happens in the next version of Java, it's pretty unlikely that it'll break any of the functionality added to the previous version. However, with DSLs our understanding of the domain can change radically as we continue to explore the problem space. On a single project, business experts often mention new domain concepts late in the project that force you to fundamentally reassess how best to model the domain. In a software product line, each new project brings different requirements that can affect the optimal design of the DSLs to describe them.

Historically in the Domain Specific Modeling (DSM) world we've tried to minimize these issues by recommending that DSLs only be introduced when the business rules change frequently (to provide a Return On Investment (ROI) for the overhead of developing the DSLs and the associated tooling) but where the structure of the domains is fairly static (to minimize the problems encountered when you start to evolve the structure of the DSLs). However, as DSLs are being used more widely, it's important to understand the issues associated with DSL evolution and some of the strategies for handling those issues.

What's the Problem?

Some types of DSL evolution are not a problem at all. If you want to add a new domain concept or a new optional attribute to a concept, you're simply extending the DSL grammar and it won't break any existing code. However, there are situations where you will have to consider the implications of such changes in the grammar on your existing statements. Some of these situations are:

Remove a concept/attribute
Add an attribute that requires a value that may be different for different statements
Turn an attribute into a separate concept with its own attributes
Add a new constraint that some of your existing statements may not meet

Abstract Grammar vs. Concrete Syntax

One important DSL concept that is going to make the rest of this discussion a little easier to follow is the difference between abstract grammar and concrete syntax. The abstract grammar of a DSL describes the structure of valid statements - including any associated constraints. The concrete syntax describes the details of exactly how you write (or draw) statements in the DSL.

For example, let’s say I have a DSL for describing a state machine; it might include the concept that an Object can have many States. The abstract grammar will just convey that an Object can have many states (and any constraints such as each object must have at least one state and that each state shall have a unique name within a given object). The concrete syntax might be a drawing in a modeling tool, an XML document, code in an internal DSL in Groovy or Ruby, a spreadsheet, a database schema for database based DSLs or a custom textual syntax. While there are some elements of DSL evolution that are more challenging in certain concrete syntaxes (such as handling the positional and graphical data associated to graphical languages), we can discuss most of the issues and implications of DSL evolution by looking at the abstract grammar, understanding that the concrete syntax we use to express the statements in that grammar is a secondary concern.

Approaches to Handling DSL Evolution

There are three common approaches to handling DSL evolution in systems where you have several existing statements:

requiring backward compatibility,
versioning the languages
automating statement transformation

Backward Compatibility

The simplest way of handling the problem is simply to avoid it. Never allow changes to the DSL grammar that could potentially break any existing statements. This is often the approach people take when they start working with DSLs and it works for a while. But eventually use cases are going to come up where you're really going to want to evolve the DSL in a way that may break the existing statements.

Language Versioning

The quickest solution to DSL evolution that may break the existing statements is to version the DSLs. Whether you're using implicit parsing of some kind of internal DSL or an explicit parser with an external DSL (perhaps using ANTLR, Xtext or some kind of XML parser if you use an XML concrete syntax), when you need to make a breaking change, you can just release a new version of the parser and make sure your system supports multiple versions, just upgrading statements when they need to take advantage of the features of the expanded grammar available in a later version. This approach can actually work pretty well up to a point, but eventually the overhead of maintaining, supporting and debugging multiple versions of the language can become burdensome.

Statement Transformation Automation

The ideal solution is to be able to automatically evolve your DSL statements so as you make changes to the grammar, the statements are automatically upgraded (where possible). One of the easiest ways of handling this is by applying transformations to your grammar and then writing scripts to apply the same changes to your DSL statements using either a scripting language or some kind of language like XSLT or ATLAS that allows for Model-to-Model (M2M) transformations.

Example Transformations

Let's imagine we're using an XML concrete syntax for a DSL that describes domain objects for an application. To start with we might have two domain classes, User and Product, as shows in Listing 1 below.

Listing 1: XML Syntax for User and Product domain objects.

<domainObject name="User" />
<domainObject name="Product" />

We decide pretty quickly that we want to add the additional concept of Property and the idea that a domainObject can have 0.n properties. This transformation won't break the existing potential statements. It simply involves a "Add concept" and a "Add optional relationship", i.e. we have added a new concept (property) and a new optional relationship (a domain *can* have n properties but doesn't have to have any). As shown in Listing 2, we could now express the statement with the Property concept:

Listing 2: Domain objects with the property attributes.

<domainObject name="User">
    <properties>
        <property name="FirstName" />
        <property name="LastName" />
    </properties>
</domainObject>

<domainObject name="Product">
    <properties>
        <property name="Title" />
        <property name="Price" />
    </properties>
</domainObject>

Now let's add the idea that properties can have a validation rule. So now we're making a transformation to "add optional attribute" where we're adding an optional "validationRule" attribute to properties. Again, because the attribute is optional, the previous statements are still valid, so we aren't breaking our current DSL statements by applying this transformation to the grammar so we don't need to do anything to the statements.

Let's say we work with that for a while and end up with the XML shown in Listing 3.

Listing 3: Domain object properties with validation rules.

<domainObject name="User">
    <properties>
        <property name="FirstName" />
        <property name="LastName" validationRule="Required" />
    </properties>
</domainObject>

<domainObject name="Product">
    <properties>
        <property name="Title" validationRule="maxlength=50"/>
        <property name="Price" validationRule="isNumeric" />
    </properties>
</domainObject>

But now we realize that we have some cases where we need to be able to associate multiple validation rules to a single property. There are lots of ways of solving this problem. Let's look at one approach. Firstly, we might just change the validationRule to be a comma delimited list of validation rules. This change wouldn't require any changes to existing statements (assuming there are no commas in any of the current validation rules). However, the language now suffers from a misleading attribute as it isn't obvious that the validationRule supports a comma delimited list of rules.

The next step might be to apply a "rename attribute" transformation. By renaming property:validationRule to property:validationRuleList you would have a more semantically meaningful attribute name. To do this, you need some kind of way of applying such transformations to existing statements. The concrete syntax you use will affect the best approach, but (for example) an XML concrete syntax (or anything that can be transformed to and from an XML projection) is fairly easy to apply these sort of transformations to.

We continue to evolve the application, and unfortunately we find that our validation rules require more sophisticated parameterization. For example we have a rule that Password and Password Confirmation properties match when a user is registering on the website. One way of writing this would be to have a validation rule like the XML snippet shown in Listing 4.

Listing 4: Validation rule with Password and Password Confirmation properties.

<validationRule name="PasswordMatchesConfirmation" type="propertyValuesMatch" 
    firstPropertyName="Password" secondPropertyName="PasswordConfirmation" />

Now the problem we have is that we need to apply an "attribute to associated concept transformation". Let's break that down. Firstly it's an "attribute to concept transformation" as we're taking the attribute property:validationRule and replacing is with a separate validationRule concept which (in an XML concrete syntax) is expressed as a separate XML element. It's an "attribute to *associated* concept transformation" as I'm deciding to have a separate section in my language for describing rules which can be reused by different properties. So for example, if both first name and last name are required properties, both can use the same "required" validation rule. The alternative approach which is a better fit for some cases would be an "attribute to *composed* concept transformation" where the rules would be composed within each property.

With an attribute to *composed* concept transformation this would have become what is shown in Listing 5 below.

Listing 5: Domain object with "attribute to *composed* concept" transformation.

<domainObject name="User">
    <properties>
        <property name="FirstName" />
        <property name="LastName">
            <validationRule name="Required" />
        </property>
    </properties>
</domainObject>

<domainObject name="Product">
    <properties>
        <property name="Title">
            <validationRule name="maxlength" value="50" />
        </property>
        <property name="Price" validationRule="isNumeric">
            <validationRule name="isNumeric" />
        </property>
    </properties>
</domainObject>

Listing 6 below shows what this has become with an attribute to *associated* concept transformation.

Listing 6: Domain object with "attribute to *associated* concept" transformation.

<domainObject name="User">
    <properties>
        <property name="FirstName" />
        <property name="LastName" validationRuleNameList="Required" />
    </properties>
    <validationRules>
        <validationRule name="Required" />
    </validationRules
</domainObject>

<domainObject name="Product">
    <properties>
        <property name="Title" validationRuleNameList="TitleMaxlength" />
        <property name="Price" validationRuleNameList="isNumeric" />
        <validationRules>
            <validationRule name="TitleMaxlength" value="50" />
            <validationRule name="isNumeric" />
        </validationRules>
    </properties>
</domainObject>

Again these are transformations that can be applied automatically to existing DSL statements.

The Limits of Automation

Of course, there are some transformations that cannot be made automatically. It's quite possible to automatically scan existing statements when you want to apply a "delete concept" or "delete attribute" transformation, but either you're going to have to make manual changes based on a report that the transformation script would provide or you're going to have to use "deprecate" instead of "delete" transformations where perhaps any tools you use will nag you whenever they see the items and may not let you add new items but won't force you to remove them.

Equally if you want to apply an "add required attribute" transformation to (perhaps) add a data type attribute to all of the properties, unless you can come up with a default (everything is a string unless I say otherwise) or some intelligent scripting rules, the best an automated tool could offer would be an efficient UI for populating all of the entries for the historic statements.

Internal v. External DSLs

It's important to be aware of the limitations of internal DSLs. They provide much of the benefit of DSLs in terms of being "end user readable", but automatically applying transformations to statements in internal DSLs embedded within another language is not an easy problem to solve in the general case. Internal DSLs are great, but make sure you have a strategy for applying transformations to those DSLs if you're going to use them extensively in a large project or a software product line. Otherwise a little bit of time in creating the tooling around an external DSL may well pay for itself over the life of a project.

Conclusions

Perhaps the most important take away from all of this is that if your DSLs are successful, then you're going to end up with lots of statements in them. If that is the case, if you ever need to evolve your DSLs, you better have a strategy for handling the transformation of those statements.

Also, it's important to realize that this is not a solved problem. There is lots of work going on around this space, but with the exception of MetaEdit+ from MetaCase, even most of the Domain Specific Modeling tooling doesn't do a great job of handling meta-model evolution.

References:

About the Author

Peter Bell is CEO/CTO of SystemsForge. He has developed a Software Product Line for generating custom web applications that blends feature modeling, product line engineering and domain specific modeling. He writes and presents internationally on domain specific modeling, code generation, lean/agile development and dynamic scripting languages on the JVM such as Groovy and CFML. He also has a blog.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

DSL Evolution

Write for InfoQ

Introduction

Avoiding the Problem

Related Sponsored Content

What's the Problem?

Abstract Grammar vs. Concrete Syntax

Approaches to Handling DSL Evolution

Backward Compatibility

Language Versioning

Statement Transformation Automation

Example Transformations

The Limits of Automation

Internal v. External DSLs

Conclusions

References:

About the Author

Rate this Article

This content is in the Architecture topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter