Interview and Book Review: DSL Engineering
Markus Völter, one of the authors of "Model-Driven Software Development", has published a new book in the field of model-driven software development (MDSD). "DSL Engineering" focusses on the design and implementation of domain specific languages (DSLs).
DSLs are languages with a vocabulary optimized for efficiently describing problems and solutions in a certain domain. In contrast, general pupose languages (GPL) like Java may be able to describe the same problems and solutions, but they typically require more verbose programs, are harder to analyze by tools and harder to understand by domain experts.
Thus, well-designed DSLs are suitable to be used by non-programmers to formally define business related models. Usually, those models are then transformed into artifacts such as GPL source code or documents.
InfoQ has had the chance to get in touch with Markus Völter, the lead-author of the book, and Christrian Dietrich, one of the co-authors:
InfoQ: Could you please both describe your experience in the field of DSLs?
Markus Völter: I have worked with models and code generation for about ten years now. I started with UML-based languages and generators, but quickly moved over into the DSL camp. Especially when using modern language workbenches, the DSL approach is just so much more powerful and productive than UML.
In any case, I have spent these last ten years building languages, analysers and generators myself, and helping customers do the same. Example projects include the proof-of-concept of the AUTOSAR standard, various DSLs for architecture definition, DSLs for configuring hearing aids and refrigerators, as well as DSLs that are used in the insurance business and in requirements engineering. In terms of tooling I worked mainly with the good old openArchitectureWare, Eclipse EMF/GMF/Xtext and more recently, JetBrains MPS. In the last two years I have spent most of my time developing the mbeddr system, which is a DSL-based development environment for embedded software based on JetBrains MPS.
Christian Dietrich: I have been working in modeling projects for more than six years. My first contact with MDSD was in a project with UML and a big proprietary code generator. Since then I did a lot using openArchitectureWare with both UML based and EMF based models. In 2008 I discovered oAW Xtext and I was excited: Creating textual languages with reasonable tooling became so easy compared to old times with lex and yacc or antlr. I started digging into the framework and since it moved to Eclipse I did a lot with it - in my job as well as supporter on the Xtext Forum at Eclipse in my spare time. I did some work with other technologies in the MDSD and DSLs area like MPS for example but kept focussed on Xtext.
InfoQ: Your book covers the whole cycle of designing, implementing and applying domain specific languages. What is the sweet spot of using DSLs and model driven software development (MDSD)?
Markus: I don't think there is one sweet spot, that's why the book has six different areas of applicability in its fourth part. These areas include requirements engineering, software architecture, very specific application logic, software implementation, using DSLs as a developer utility and using them in the context of software product lines. I have seen good uses of DSLs in each of these areas. Here are some guidelines that probably decide about whether using a DSL in any of these fields will be successful - in addition to having competent developers, of course: you have to really understand the domain for which you're building the DSL, or at least have some way of iteratively building this understanding. Also, the domain needs sufficiently many specific abstractions or notations to warrant building and using a DSL instead of a GPL with a library or framework. Another good reason for using a DSL is if you need to do advanced analyses, for which you need static (i.e. compile-time) domain-level semantics. Another good reason is if you want non-programmers to develop applications in the domain - notice how I don't use the word "program" here - which typically requires removing all the GPL-induced noise from the code. Finally, the more often you use a DSL to build systems, and the lower the effort to build a DSL (this is where language workbenches come in), the more easy it is to justify developing a DSL.
InfoQ: On the other hand, could you also give some hints, on when not to use those techniques?
Markus: Well, if and when the criteria I describe above are not present :-) To be a bit more serious, there is this saying that any useful DSL will inevitably end up looking like a GPL. My experience tells me that this is not true, but, of course, there is the risk that you start developing a DSL while you don't really understand the DSL. Your DSL will be nice, declarative and simple, until you really understand the domain. At this point, you may be tempted to add loops, conditionals and all the other stuff found in GPLs. This is indeed a risk. Modular languages and language extension can remedy this risk a little bit: instead of developing a completely separate DSL, you may want to consider incrementally extending a base language such as Java or C with domain-specific concepts. Users can always fall back to the Java or C level, so you don't have to provide DSL concepts for every corner case in the domain. Some of the current language workbenches (MPS in particular) are really good at this language modularity. The mbeddr project I mentioned above exploits this idea to incrementally extend C with domain-specific concepts for embedded software development.
InfoQ: This sounds like the best of both worlds - quite powerful. But application development will then be the duty of software developers again?
Markus: Yes, you are right. This works only for DSLs that are used by programmers who know the base language. This highlights a nice difference between two styles of DSLs: application domain DSLs are intended to be used by domain experts; they should contain the domain concepts, and ideally nothing else. They are often developed top-down, i.e. you start from the concepts found in the real-world domain. The opposite, if you will, are technical DSLs. They are intended to be used by developers. They are often built by *adding* domain- specific abstractions to a GPL, they *should* contain all the GPL stuff to not restrict users, but still make life simpler by providing higher-level concepts. They are typically developed bottom up, i.e. you start from the GPL and existing idioms or patterns.
InfoQ: Christian, you are currently working in one of the largest MDSD projects in Germany. What are the steps one has to take, to successfully design and use domain specific languages?
Christian: The first step is to understand the domain and its concepts. Without that knowledge it is impossible to find the right abstractions. Then, when defining the abstract and concrete sytax for the language, one should keep an eye on the understandability and clear semantics of the concepts. It often helps to work iteratively. If you use the DSLs to generate code or documentation or to let an interpreter do some simulations, then develop these artefacts together with the concepts in the DSLs. It helps to prove the qualitiy of the abstract syntax. And if you develop DSLs and generators as a framework: eat your own dog food. It shows if your DSL reaches its goals or if it is useless. Another point is to think about the size and scale of the model early to be able to design the DSLs to perform with realistics models. Using five lines test files wont uncover misconceptions regarding performance.
InfoQ: Looking back the years on your project, what are the most common pitfalls in using DSLs?
Christian: I think a common pitfall is that DSLs are getting either over-engineered - defining a concept for every special case leads to zero abstraction or too general - "GPLish" - over the time. You might end up in complexity hell. To weaken this effect, it is important to evolve the DSL over time. Therefore you must not fear refactoring the language, especially if you can accomplish this with tool support. This is easier in MPS than in Xtext. Another common pitfall is to concentrate too much on concrete syntax and loosing sight of the abstract syntax and semantics. As mentioned before, it is a bad idea to develop DSLs in an ivory tower with no contact and application to the domain. You have to prove that your DSLs fit the needs of the domain on a regular basis.
InfoQ: Markus, it seems that using DSLs and a model driven approach does not pay off from the first day. What is your opinion regarding project sizes and setups that should be present to benefit from MDSD?
Markus: I wouldn't subscribe to what you said there. A small DSL that I can build in two hours can certainly pay off on the first day. Of course, a bigger DSL takes longer to develop, and hence it takes longer for it to pay off. It is all about the ratio. So there really is no specific size or setup. I have seen simple DSLs being developed by small developer teams, used only in one project. I have also seen big efforts being spent that are assumed to pay off over the years-and-decades-long lifetime of a product platform. Especially in the beginning it is a good idea to start with a small problem, since larger efforts - as usual - have an increased risk of failing for all the well-known reasons associated with size and scale. Once again, I like the approach of incrementally extending a base language: it lets you add more domain-specific abstractions as the need arises ("three strikes and you automate").
InfoQ: In the book, three DSL frameworks are mentioned - Xtext, Jetbrains MPS and Spoofax. Could you elaborate on the differences of these frameworks? Are they replaceble by each other or do they have their unique scenario or usecase?
Markus: The three are very different, which was a major reason for selecting those for the book. Xtext is the mainstay for building textual, external DSLs these days. It is mature, well supported, and supports Eclipse EMF, which is the backbone of many of today's modeling efforts. Spoofax is also Eclipse-based, but it does not rely on EMF. It is a system developed by TU Delft and is much more innovative in terms of the features it supports, e.g. it has a declarative language for name binding and scoping, and supports language modularity to a greater extent than Xtext. On the other hand, it is much less widespread. JetBrains MPS is quite different than those two. It does not use plain text editing and parsing. Instead it uses a projectional approach, where each editing action directly changes the AST and what you see and interact with is merely a projection. This allows using a much wider range of notations, including tables, fraction bars, and later this year, graphical shapes. It also makes it very easy to extend languages and combine independently developed extensions in a single program. It is not as widely used as Xtext, but its growing nicely. With all three tools you can build simple bread-and-butter DSLs, the tools are exchangeable. However, the emphasis is quite different. For example, Xtext with Xbase and Xtend interoperates quite nicely with the Java ecosystem. It is easy to build DSLs that reuse Java's types and expressions and generate to Java. Spoofax, being developed by a research group, is also a research vehicle, as showcased by some of its more recent features. MPS clearly has its sweet spot if you build whole ecosystems of languages, with languages referencing, extending and embedding each other or, or when "strange" domain notations are required. It's hard to answer this question briefly. I guess you should read Part III of the book and then form your own opinion :-)
About the Book Authors
Markus Völter has been working in the field of model-driven software development and domain specific languages for 10 years now. He is also a regular speaker on this topic at various conferences.
Christian Dietrich is working as a consultant for Itemis AG, Germany. Itemis is not only providing consulting services for the Eclipse projects Xtext and Xtend used to define DSLs and to generate artifacts from models, but also actively developing these projects.