A domain-specific language (DSL) is a programming language, rather than a general purpose language, targeted to a particular problem domain. DSLs are a great way of creating DRYer code that is "business user readable". Over the last few years there have been a lot of articles published in the general programming world about Domain Specific Languages (DSLs).
Creating a domain specific language isn't hard. However, our understanding of the domain always continues to evolve, so for a DSL to be useful over the long run, we need to have a strategy for evolving our DSLs. If you are undertaking a large project or developing a software product line (SPL) where you will be using your DSLs over a substantial period of time, you better have an idea of how you're going to handle the evolution of your DSLs.
From backwards compatibility through versioning, to automated transformation of statements, this article will look at approaches to structuring your domain specific languages in a way that will simplify the process of evolving them over time.
Avoiding the Problem
In the 3GL world, language designers are very aware of the importance of backwards compatibility. Whatever happens in the next version of Java, it's pretty unlikely that it'll break any of the functionality added to the previous version. However, with DSLs our understanding of the domain can change radically as we continue to explore the problem space. On a single project, business experts often mention new domain concepts late in the project that force you to fundamentally reassess how best to model the domain. In a software product line, each new project brings different requirements that can affect the optimal design of the DSLs to describe them.
Historically in the Domain Specific Modeling (DSM) world we've tried to minimize these issues by recommending that DSLs only be introduced when the business rules change frequently (to provide a Return On Investment (ROI) for the overhead of developing the DSLs and the associated tooling) but where the structure of the domains is fairly static (to minimize the problems encountered when you start to evolve the structure of the DSLs). However, as DSLs are being used more widely, it's important to understand the issues associated with DSL evolution and some of the strategies for handling those issues.
What's the Problem?
Some types of DSL evolution are not a problem at all. If you want to add a new domain concept or a new optional attribute to a concept, you're simply extending the DSL grammar and it won't break any existing code. However, there are situations where you will have to consider the implications of such changes in the grammar on your existing statements. Some of these situations are:
- Remove a concept/attribute
- Add an attribute that requires a value that may be different for different statements
- Turn an attribute into a separate concept with its own attributes
- Add a new constraint that some of your existing statements may not meet
Abstract Grammar vs. Concrete Syntax
One important DSL concept that is going to make the rest of this discussion a little easier to follow is the difference between abstract grammar and concrete syntax. The abstract grammar of a DSL describes the structure of valid statements - including any associated constraints. The concrete syntax describes the details of exactly how you write (or draw) statements in the DSL.
For example, let’s say I have a DSL for describing a state machine; it might include the concept that an Object can have many States. The abstract grammar will just convey that an Object can have many states (and any constraints such as each object must have at least one state and that each state shall have a unique name within a given object). The concrete syntax might be a drawing in a modeling tool, an XML document, code in an internal DSL in Groovy or Ruby, a spreadsheet, a database schema for database based DSLs or a custom textual syntax. While there are some elements of DSL evolution that are more challenging in certain concrete syntaxes (such as handling the positional and graphical data associated to graphical languages), we can discuss most of the issues and implications of DSL evolution by looking at the abstract grammar, understanding that the concrete syntax we use to express the statements in that grammar is a secondary concern.
Approaches to Handling DSL Evolution
There are three common approaches to handling DSL evolution in systems where you have several existing statements:
- requiring backward compatibility,
- versioning the languages
- automating statement transformation
The simplest way of handling the problem is simply to avoid it. Never allow changes to the DSL grammar that could potentially break any existing statements. This is often the approach people take when they start working with DSLs and it works for a while. But eventually use cases are going to come up where you're really going to want to evolve the DSL in a way that may break the existing statements.
The quickest solution to DSL evolution that may break the existing statements is to version the DSLs. Whether you're using implicit parsing of some kind of internal DSL or an explicit parser with an external DSL (perhaps using ANTLR, Xtext or some kind of XML parser if you use an XML concrete syntax), when you need to make a breaking change, you can just release a new version of the parser and make sure your system supports multiple versions, just upgrading statements when they need to take advantage of the features of the expanded grammar available in a later version. This approach can actually work pretty well up to a point, but eventually the overhead of maintaining, supporting and debugging multiple versions of the language can become burdensome.
Statement Transformation Automation
The ideal solution is to be able to automatically evolve your DSL statements so as you make changes to the grammar, the statements are automatically upgraded (where possible). One of the easiest ways of handling this is by applying transformations to your grammar and then writing scripts to apply the same changes to your DSL statements using either a scripting language or some kind of language like XSLT or ATLAS that allows for Model-to-Model (M2M) transformations.
Let's imagine we're using an XML concrete syntax for a DSL that describes domain objects for an application. To start with we might have two domain classes, User and Product, as shows in Listing 1 below.
Listing 1: XML Syntax for User and Product domain objects.
<domainObject name="User" /> <domainObject name="Product" />
We decide pretty quickly that we want to add the additional concept of Property and the idea that a domainObject can have 0.n properties. This transformation won't break the existing potential statements. It simply involves a "Add concept" and a "Add optional relationship", i.e. we have added a new concept (property) and a new optional relationship (a domain *can* have n properties but doesn't have to have any). As shown in Listing 2, we could now express the statement with the Property concept:
Listing 2: Domain objects with the property attributes.
<domainObject name="User"> <properties> <property name="FirstName" /> <property name="LastName" /> </properties> </domainObject> <domainObject name="Product"> <properties> <property name="Title" /> <property name="Price" /> </properties> </domainObject>
Now let's add the idea that properties can have a validation rule. So now we're making a transformation to "add optional attribute" where we're adding an optional "validationRule" attribute to properties. Again, because the attribute is optional, the previous statements are still valid, so we aren't breaking our current DSL statements by applying this transformation to the grammar so we don't need to do anything to the statements.
Let's say we work with that for a while and end up with the XML shown in Listing 3.
Listing 3: Domain object properties with validation rules.
<domainObject name="User"> <properties> <property name="FirstName" /> <property name="LastName" validationRule="Required" /> </properties> </domainObject> <domainObject name="Product"> <properties> <property name="Title" validationRule="maxlength=50"/> <property name="Price" validationRule="isNumeric" /> </properties> </domainObject>
But now we realize that we have some cases where we need to be able to associate multiple validation rules to a single property. There are lots of ways of solving this problem. Let's look at one approach. Firstly, we might just change the validationRule to be a comma delimited list of validation rules. This change wouldn't require any changes to existing statements (assuming there are no commas in any of the current validation rules). However, the language now suffers from a misleading attribute as it isn't obvious that the validationRule supports a comma delimited list of rules.
The next step might be to apply a "rename attribute" transformation. By renaming property:validationRule to property:validationRuleList you would have a more semantically meaningful attribute name. To do this, you need some kind of way of applying such transformations to existing statements. The concrete syntax you use will affect the best approach, but (for example) an XML concrete syntax (or anything that can be transformed to and from an XML projection) is fairly easy to apply these sort of transformations to.
We continue to evolve the application, and unfortunately we find that our validation rules require more sophisticated parameterization. For example we have a rule that Password and Password Confirmation properties match when a user is registering on the website. One way of writing this would be to have a validation rule like the XML snippet shown in Listing 4.
Listing 4: Validation rule with Password and Password Confirmation properties.
<validationRule name="PasswordMatchesConfirmation" type="propertyValuesMatch" firstPropertyName="Password" secondPropertyName="PasswordConfirmation" />
Now the problem we have is that we need to apply an "attribute to associated concept transformation". Let's break that down. Firstly it's an "attribute to concept transformation" as we're taking the attribute property:validationRule and replacing is with a separate validationRule concept which (in an XML concrete syntax) is expressed as a separate XML element. It's an "attribute to *associated* concept transformation" as I'm deciding to have a separate section in my language for describing rules which can be reused by different properties. So for example, if both first name and last name are required properties, both can use the same "required" validation rule. The alternative approach which is a better fit for some cases would be an "attribute to *composed* concept transformation" where the rules would be composed within each property.
With an attribute to *composed* concept transformation this would have become what is shown in Listing 5 below.
Listing 5: Domain object with "attribute to *composed* concept" transformation.
<domainObject name="User"> <properties> <property name="FirstName" /> <property name="LastName"> <validationRule name="Required" /> </property> </properties> </domainObject> <domainObject name="Product"> <properties> <property name="Title"> <validationRule name="maxlength" value="50" /> </property> <property name="Price" validationRule="isNumeric"> <validationRule name="isNumeric" /> </property> </properties> </domainObject>
Listing 6 below shows what this has become with an attribute to *associated* concept transformation.
Listing 6: Domain object with "attribute to *associated* concept" transformation.
<domainObject name="User"> <properties> <property name="FirstName" /> <property name="LastName" validationRuleNameList="Required" /> </properties> <validationRules> <validationRule name="Required" /> </validationRules </domainObject> <domainObject name="Product"> <properties> <property name="Title" validationRuleNameList="TitleMaxlength" /> <property name="Price" validationRuleNameList="isNumeric" /> <validationRules> <validationRule name="TitleMaxlength" value="50" /> <validationRule name="isNumeric" /> </validationRules> </properties> </domainObject>
Again these are transformations that can be applied automatically to existing DSL statements.
The Limits of Automation
Of course, there are some transformations that cannot be made automatically. It's quite possible to automatically scan existing statements when you want to apply a "delete concept" or "delete attribute" transformation, but either you're going to have to make manual changes based on a report that the transformation script would provide or you're going to have to use "deprecate" instead of "delete" transformations where perhaps any tools you use will nag you whenever they see the items and may not let you add new items but won't force you to remove them.
Equally if you want to apply an "add required attribute" transformation to (perhaps) add a data type attribute to all of the properties, unless you can come up with a default (everything is a string unless I say otherwise) or some intelligent scripting rules, the best an automated tool could offer would be an efficient UI for populating all of the entries for the historic statements.
Internal v. External DSLs
It's important to be aware of the limitations of internal DSLs. They provide much of the benefit of DSLs in terms of being "end user readable", but automatically applying transformations to statements in internal DSLs embedded within another language is not an easy problem to solve in the general case. Internal DSLs are great, but make sure you have a strategy for applying transformations to those DSLs if you're going to use them extensively in a large project or a software product line. Otherwise a little bit of time in creating the tooling around an external DSL may well pay for itself over the life of a project.
Perhaps the most important take away from all of this is that if your DSLs are successful, then you're going to end up with lots of statements in them. If that is the case, if you ever need to evolve your DSLs, you better have a strategy for handling the transformation of those statements.
Also, it's important to realize that this is not a solved problem. There is lots of work going on around this space, but with the exception of MetaEdit+ from MetaCase, even most of the Domain Specific Modeling tooling doesn't do a great job of handling meta-model evolution.
- Domain Specific Language
- Automating the Transformation of Statements in Evolving Domain Specific Languages
About the Author
Peter Bell is CEO/CTO of SystemsForge. He has developed a Software Product Line for generating custom web applications that blends feature modeling, product line engineering and domain specific modeling. He writes and presents internationally on domain specific modeling, code generation, lean/agile development and dynamic scripting languages on the JVM such as Groovy and CFML. He also has a blog.
Actually, one of the issues we should also take into account is how extensible is the language. A language
contains a set of constructors, some of them are totally focused on domain representation and action, some others are more related to the actual extension of the language itself.
Function declarators and new data type declarators are examples of these.
When working with the first ones, the core domain concepts should be included. A dynamic domain should be studied, since the changing things may be concepts (usually old concepts remain the same, new concepts may be introduced) and actions. Actions can be added using the extending capabilities of language. New concepts may follow the same path, but usually mean adding new constructs.
Here we have another problem: The language may become bloated.
So, I would add as tips:
1. Work the language with the core concepts.
2. Add extension constructors to allow action flexibility, and may also add the capability of adding new concepts.
3. Make concepts independent, so adding new ones wont affect the old ones.
4. Keep an eye on bloating languages.
William Martinez Pomares
get the UML right
- is private access in UML
+ is public access in UML
Re: get the UML right
BTW, in my usage the UML is correct. The name object is a UML attribute, not some sort of language property (e.g. not a C# property). The MetaObject is just that, part of the metamodel that contains the a DSL's model. What is not shown is the getName() accessor method. I purposely left out the operations compartment of the classes in the static structure diagram so as not to distract. I assume anyone familiar with UML followed that purposeful omission, as I would never provide direct public access to an object's instance attribute. If you read my article you will see why MetaObject is used as an abstract base class and what sort of concrete subclasses extend it. The use of the name attribute of the MetaObject class is clearly explained.
The article was originally posted on my company web site, and later published by InfoQ: shiftmethod.com/ under Publications > Languages and Tools.
@William, Interesting points. There are indeed a number of ways of supporting language extensions, although you run into the same problem should (for example) the structure that you use to describe user defined data types change over time in a way that requires transformations to your data type definitions. Another approach to bear in mind is to have lot of little languages to avoid language bloat.
@"Humphrey"/Vaughn, The image was added by the InfoQ team, so not much I can say about that other than apologies if it offends!
Re: Thanks all!
This was a good introductory article.
What I'm missing here are some things.
Use Refactoring to evolve internal DSLs. At least for statically typed languages but also for the dynamic ones there is better and better refactoring support available. So you can use these tools to perform even more complicated automated refactorings of the DSL statements of your internal DSL.
Read the DSLs into the semantic model, and generate the next version DSL statements from the model using an appropriate generator. So it is handy that you have a parser as well as a generator for each version of your language. You could also apply transformations or modifications of the parsed content after it is contained in the semantic model before regeneration.
Some examples of possible xslt transformations for your sample would have been nice. Although XML is in general a bad choice for concrete syntax of a DSL it is not very human readable (for the target audience of business users).
Also a discussion on testing (acceptance and functional) of the evolved DSLs would have been interesting.
What is also interesting how this later evolvment corresponds to the initial evolving of a DSL together with the target users. These first steps are very dynamic and the abstract grammar as well as the concrete syntax change constantly. But as there aren't many DSL statements at this time this evolution must not deal with transforming existing code.
It's the same situation with the development with APIs. Once they have been published they are very hard to change as dependent code on top of those APIs exist. So which of the lessons we learned from evolving APIs apply to DSLs as well an which don't.
Re: Open Issues
Thanks for the comment! I certainly see the potential for using refactoring to transform internal DSLs (as you said - particularly in statically typed languages with good refactoring support). Unfortunately, the languages that currently have good refactoring support aren't usually great candidates for elegant internal DSLs. Also, some of the basic refactorings required for DSL evolution seem to me quite different from the common refactorings we use in IDEs, so (for example) I'm not immediately sure how I'd implement a "attribute to associated concept transformation" using IDE refactoring - even using IntelliJ for Java, Resharper for C# or similar. Would you care to suggest how that might be done in practice?
I definitely agree that one productive approach for upgrading atatements is loading them into a runtime model and then using that with some kind of generator to generate statements in the upgraded version - with or without human intervention depending on the kind of transformation. The challenge is that if you use an internal DSL where you may well have custom code in the host language mixed in with your DSL statements, generation of that while preserving any custom code is non-trivial. You can minimize this by following conventions such as not mixing code with the generated statements using any of the standard patterns for active code generation (generally separating generated code into different files using sub-classing, mixins, AOP and the like - although occasionally using protected blocks if you really must) can work. However, the nice thing about external DSLs in this case is that they are less flexible, so you can't just put arbritary 3GL code into your model (unless you specifically decide to support that).
I actually disagree that XML is a *bad* syntax for an external DSL. It's not as elegant as many others, but it's trivial to parse, great for sharing between different types of systems and is very widely processable and you get some very basic validations for close to free using XSD's. I agree that it's not very end user readable, but there is no reason why end users should have to read the actual concrete syntax of an external DSL when you can pretty print it to the screen using some kind of simple templating language. It *is* an extra step, but given the simplicity of processing XML vs. working with ANTLR and Lexx and Yacc for a custom textual syntax, I think there are valid use cases for XML as a concrete syntax - especially when getting started with external DSLs. As tooling like xtext continues to mature (now part of the Eclipse Modeling Framework - until recently part of openArchitectureWare) I think the math on that might change, but even the latest version of EMF has issues to deal with - especially when processing really large models (although I believe that's got a lot better recently). Also, Eclipse based tools are based for end users comfortable with Eclipse, and that isn't a good fit for all use cases.
Testing of DSLs is a whole other article - I may propose an article on the various approaches to testing and DSLs for the spring.
I agree that DSLs are most actively changing early on, but the fact is that if (for example) you have an active, long lived Software Product Line, I can say from experience that even many years in you'll want to apply substantial transformations that in my case would affect tens of thousands of statements across hundreds of projects, so ongoing DSL evolution *is* often something you have to consider. This is equally true if you're using DSLs in a single system for many years. There are so many drivers of change - regulatory, business, technical, competitive, it's quite possible that radical refactoring of DSLs will be required even once you have a lot of statements in the DSLs.
I do agree that API (and also database schema) evolution share traits and lessons for DSL evolution. When I first started looking into DSL evolution seriously back in 2007 I spent most of my time researching approaches for API and db schema evolution as they seemed the most relevant similar areas as there wasn't a huge amount of published material on DSL evolution at the time.
Re: Open Issues
The refactoring of moving an attribute to a separate concept could be done through a move instance method (e.g. if the attribute is done through a method of a builder) or extract class or extract method object. It depends on the concrete case.
Your suggested use of XML is then the AST/Parse Tree representation instead of the DSL itself. Imho the DSL is what the end user sees on his screen. Whatever internal model is used (XML or other) to represent the information does not matter to him and is replaceable in the implementation. You could also use any other kind of parse model or even semantic model and have it rendered any way you'd like. In the end it's all about tree based representations.
The same discussion is the basis of Language Workbenches which also use a tree based internal model and render this to a multitude of projections and projectional editors.
I agree with you that tools like xText which are a kind of in between of a pure parser generator and a fully blown language workbench will have a bright future.
Regarding the change rate of a DSL: What I meant was that the rapid changes in the infancy of a API/DSL can be dealt with quite easily. But once it is published to a bigger audience all those changes must be handled with much more care, planning and attention to detail (especially regarding backward compatibility).
Re: Open Issues
Interesting thoughts re: using refactorings for DSL evolution within internal DSLs. I think as you mentioned that will become more relevant as dynamic languages such as Groovy and Ruby and newer, more concise static languages like Scala get better IDE refactoring support. Definitely something to investigate further - thanks!
Re: XML, I find that when for whatever reason I need to "roll my own" tooling, I can get a usable solution working pretty quickly using XML - as you said - as the underlying representation.
I think you make a reasonable point that from the perspective of the business user - or even modeler, the important representation to them of the DSL is the specific concrete representation they see. However, even in simple systems, I often find that I'll provide different projections to different types of users, so there really is no one concrete syntax. When I'm talking about things like XML, databases and pure textual DSLs, I'll admit I'm mainly thinking from a model makers perspective in terms of the details of the implementation under the hood from where the projections can be derived for different types of users, and it is there that I'm suggesting XML works for a certain subset of use cases.
The problem with the definition of the DSL as being "what the end user sees on their screen" is that now that we're all at least thinking in terms of projectional editors with MPS and Intentional, what the user sees depends which representation they choose and possibly what user type they are!
I definitely agree with your original contention that XML is not a great representation to show to end users - especially business users, and agree that the internal implementation shouldn't have too much impact on such end users (although artifacts of particular concrete syntaxes or implementations may affect their experience of the system). So I guess I should reframe the original comment as "XML is a quick and maleable tree based representation for internally implementing DSL tooling should you need to do so". That's probably a clearer statement than my original.
Thanks for the clarification re: DSLs. The only comment I would add is that I think there is real benefit in being aware of patterns for handling DSL evolution upfront and understanding the implications of choices like internal vs. external DSLs in terms of how easy the statements will be to transform.
I am concerned as I see large efforts progressing with (for example) Ruby or Groovy based DSLs interwoven with native code - especially by dev teams that don't have much of a background in Domain Specific Modeling. I think there are many cases where an internal DSL in a dynamic language can be an excellent choice, but I *do* believe that depending on the implementation it could end up as a maintenance nightmare down the line were the languages to become very widely used and then have to change radically.
I'd just like to see ease-of-evolution as one of the standard forces to be considered when making choices about DSL implementations, and I get the feeling at some of the conferences that I present at, that the promotion of DSLs has started to get ahead of the promotion of DSM best practices.
Re: get the UML right
Re: get the UML right
Anyone familiar with UML wouldn't remove public Properties and Methods and retain the private Properties and Methods.
Never providing direct public access to an object's instance attributes is a convention specific to mainstream OO programming languages. There are many languages where it isn't necessary or doesn't make sense. You wouldn't/couldn't implement it that way in SQL DDL or CLP. Likewise, a reader couldn't assume the DSL was a C#-like/Java-like language.
It is a generic software engineering principle to separate specification from implementation. The private Properties are part of the implementation, not the specification.
Re: get the UML right
Would you consider Eric Evans to be familiar with UML? Please see his book: "Domain Driven Design--Tackling Complexity in the Heart of Software". It is full of examples like mine (actually Eric basically uses package/module scope, not private or public). Note his quote on page 37: "Always remember that the model is not the diagram." Again, if you read my article you can see very plainly that I am not using the UML as source input to my DSL translator. I am not doing graphical MDA. My DSL is textual. Thus I am only using UML to convey design concepts with the same motivation that Eric Evans does. In fact my metamodel and tool are designed using DDD, and they implement its patterns. Using DDD as a design motivation, I only show operations (that's the UML terminology for OO methods; your terminology is wrong) if they add some thought conveying significance. Showing a getter method for an attribute hardly adds significance.
Do you consider Martin Fowler to be familiar with UML? Look at his book: "UML Distilled". It's got examples just like mine. See page 81 for such an example. There are others. See his book P of EAA. It's full of thought conveying diagrams like Eric's and like mine.
Look at the UML classes in my article. Clearly their operations compartment is hidden. I didn't remove any UML operations (what you call "Methods"). They are all there. It's just that the operations compartment is hidden. Your UML tool can do that, can't it? You know how to use your UML tool to hide the operations compartment, don't you?
Re: get the UML right
These kinds of issues with usage of UML, where practises becomes widespread and adopted by community leaders, mean a choice between following the letter of a specification or common practise. UML community leaders are in a different place to other community leaders. Hence my frustrations as UML heads off into less compatible usage.