BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Projecting a Modular Future

Projecting a Modular Future

Bookmarks

This article first appeared in IEEE Software magazine. IEEE Software offers solid, peer-reviewed information about today's strategic technology issues. To meet the challenges of running reliable, flexible enterprises, IT managers and technical leads rely on IT Pro for state-of-the-art solutions.

 

Two innovations enhance the capabilities of programming languages. Modularity lets you combine independently developed languages without changing their respective definitions. Projectional editing lets you build languages that use a variety of diverse notations beyond just text. MODULAR LANGUAGES PROVIDE abstractions that aren’t fixed. Users can pick language extensions from a library and include them in programs without changing the base language definition or IDE. Multiple independently developed extensions can be used together, and new extensions can be developed and used at any time. For a classification of language composition approaches, see “Language Composition Untangled.”1

A promising tool for developing modular languages is language workbenches (LWBs). LWBs are environments that support the efficient implementation of languages and associated tools such as type checkers, compilers, interpreters, and IDEs. (For a little background on LWBs, see the sidebar.)

We focus here on MPS, an open source LWB licensed under Apache 2.0. JetBrains has developed it over the past 10 years, on the basis of the initiative of Sergey Dmitriev. MPS has been used for a number of domains, including computational biology2, Web applications, plus the three we describe in this article. MPS’s most distinguishing feature is its projectional editor (PE), which allows programming languages to incorporate multiple types of notations.

This article describes how we’ve used modular languages and projectional editing to design programming languages for embedded-software development, requirements engineering, and insurance rules. We either built or supported the implementation of the systems we describe.

Why Use MPS and Its Projectional Editor?

Languages typically use either textual or graphical notations; each style comes with different user experiences and use cases, and the respective editors also use different architectures. In textual languages, users interact with the concrete syntax, entering characters into a text buffer. A parser matches the sequence of characters against the grammar that defines the language’s syntax and then constructs an abstract syntax tree (AST) of the program. The AST contains much more structure than the at textual notation. Even though modern IDEs construct the AST in real time as the user edits the program (maintaining an always up-to-date AST), users interact with and change the textual source.

Graphical editors are different. For example, if a user drops a UML class from the palette onto the diagram, the underlying tool directly modifies the AST (also called the model in graphical editors). A rendering or projection engine then creates a visual representation of the AST. This approach can be generalized beyond graphical notations: the result is a PE. A PE immediately recognizes every text token as it’s entered. So, there’s never any extraction of the AST from the concrete syntax by a parser.

This has several advantages. Because the PE doesn’t need to extract the program’s structure from a at source, it can use a variety of notations. MPS, for example, supports textual notation, symbols (such as fraction bars or Σ), tables, and diagrams; these notations can also be mixed. As we’ll show, this notational freedom enables languages that are much closer to the established notations of many application domains.

PEs also simplify implementation of modular languages. Because PEs don’t use grammars, the limitations of composability associated with grammars (and described wonderfully in “Pure and Declarative Syntax Definition: Paradise Lost and Regained”3) don’t apply. Of course, language composition still requires alignment of the semantics (which can be a challenge), but from a purely syntactical perspective, composability is unlimited.

Despite these advantages, PEs haven’t seen much adoption because of their drawbacks. What distinguishes MPS from earlier PEs is the extent to which it addresses these drawbacks. The following two drawbacks are the most important.

First, for languages that use a textual syntax, users expect the editor to behave like regular, character oriented text editors. Because PEs don’t work with sequences of characters, this can be a challenge. MPS addresses this drawback with a variety of approaches; we mention two examples here. MPS supports 4 linear editing of expressions such as 2+3 instead of requiring users to first enter + and then the two arguments. So-called side transformations restructure the tree accordingly if a + is entered on the right side of an expression (such as the 2 in this case). Side transformations take into account precedence, specified declaratively as a number for each operator. MPS also supports cross-tree editing; for example, users can enter parentheses in arbitrary locations to change, for instance, 2+3*4 to (2+3)*4 Earlier PEs didn’t support linear editing or cross-tree editing.

The second drawback concerns infrastructure integration. PEs don’t store programs as text because that would reintroduce parsing when the program is loaded and hence negate PEs’ advantages. Instead, the AST is persisted, typically as XML. So, PE users must address the integration of these XML les with version control systems: the PE must support diff/ merge, using the concrete, projected. syntax. MPS supports this and is used routinely with git or svn.

Example: Embedded Programming

Embedded software must respect constraints regarding code size, memory, and timing. At the same time, quality, maintainability, and safety are critical.

Challenges

Much of today’s embedded software is developed in C because you can manually optimize C code to meet the constraints we just mentioned. But C lacks the means to build new abstractions effectively, potentially hampering quality, maintainability, and safety. Because new abstractions must incur little or no runtime overhead, the preprocessor is often used to build them. However, preprocessor- based abstractions are brittle because the type checker, the IDE, and static analysis tools have only limited awareness of them. To illustrate this problem, we discuss physical units and state machines.

Embedded software often works with real-world quantities, and annotating types and literals with physical units can enhance type safety by detecting problems such as

10 /*s*/ 5/*m */ . double e/*metersPerSec*/speed =

C doesn’t support annotating physical units to types and literals such that the type checker can detect these problems. You could use macros for conversions (for example, meters to feet, m_to_ft(val)). However, neither the IDE nor the compiler knows about these macros’ semantics, so they can’t check whether the macros are used correctly.

You can also use macros to implement state machines. However, in addition to their brittleness, macros in this case are also hard to read because of their limited syntactic flexibility. In practice, state machines are often implemented in plain C (using switch statements or cross-indexed arrays and function pointers) or with an external state machine modeling tool that generates C code. Both solutions are problematic. In the first case, the state machines’ semantics is lost for the developer, type checker, compiler, and IDE. So, you can’t easily analyze the semantics for dead states or nondeterministic transitions. In the second case, integration is an issue. The modeling tool usually doesn’t know the rest of the C-based system and thus can’t check for the wrong use of variables or functions in a state machine.

The Solution

mbeddr is a set of 70-plus tightly integrated languages and an IDE. It incrementally adds abstractions for embedded-software development to a modularized version of C. Besides physical units and state machines, mbeddr supports interfaces and components, unit testing, product line variability, requirements tracing, and documentation. Because these abstractions are modular C extensions, users can fall back to C if the higher-level abstractions don’t match the domain or aren’t efficient enough. Users can also build new extensions or create new generators for existing extensions. IDE support, such as type checking, code completion, find usages, and refactoring, works seamlessly across extensions. A special architecture supports extension debugging.

The mbeddr extension for state machines uses textual syntax. State machines contain local variables and event declarations, as well as states with entry and exit actions, and transitions with guards and actions. Events are used to communicate with a state machine’s environment and can be bound to different triggers. (For example, an in event can be bound to an interrupt, and an out event can be bound to a function call.) The default code generator for state machines generates a switch statement. However, like every mbeddr generator, users can exchange it with an optimizing generator for specific kinds of state machines or target platforms. Because state machines are represented as first-class language concepts, they can be model checked: mbeddr supports detection of dead states and nondeterministic transitions, and users can define additional constraints using temporal logic. mbeddr reports failed properties in terms of the state machine, not the generated code.

Another extension provides the ability to annotate types and literals with units, which are integrated with the C type system. An error is flagged in the IDE, if, for example, a variable that has the unit m/s is assigned an expression whose computed unit is different. The annotated units affect only the type system, thus incurring no runtime overhead. To convert units, users can employ conversion rules, which are type safe in terms of C types and units.

Modularity and Projection

mbeddr’s modularity has several advantages. From a language developer’s perspective, mbeddr’s individual languages are less complex, allowing their relatively independent evolution. Users can choose which extensions to use in a program so that they’re not overwhelmed by a huge, monolithic language. In addition, they can incrementally grow mbeddr toward their domain by creating new, domain-specific extensions.

mbeddr’s C extensions aren’t just coarse-grained extensions that could easily be implemented with escapes inparser-based systems. In particular, the units extend the very ne-grained type and expression syntax. The ability to combine extensions—for example, state machines can use units in the guards—is crucial in practice. MPS’s PE lets mbeddr show the same model in different ways. For example, users can edit a state machine as a table, with events as column headers and states as row headers. The remaining cells contain the transitions for a given state–event combination.

FIGURE 1. Embedding code in requirements. (a) A requirement whose prose description contains variable definitions and a formula. The requirement also contains a pricing table as an example extension. The variables, formula, and table are program elements and not just formatted text. (b) Traces attached to program code. Users can attach such traces to programs expressed in any language.

Experience

An industrial project is developing a smart meter, with hard real-time requirements and a memory- constrained target platform. Experience from this project shows that mbeddr’s abstractions lead to more maintainable and testable software, while not exceeding the target hardware’s resources. Siemens PLM Software has selected mbeddr as the basis of its new control-engineering tool. Among other things, the tool will add support for graphical data ow models and tabular data dictionaries to mbeddr. For a summary of the experience with mbeddr, see “Preliminary Experience of Using mbeddr for Developing Embedded Software.” 5

Example: Requirements Engineering

Requirements are usually expressed as prose, plus some structured data such as tables. Tool support beyond Word or Excel exists, exemplified by DOORS. But even DOORS expresses requirements mostly as prose. Many development processes and industry standards mandate requirements tracing, which connects implementation artifacts to the requirements driving them. This aids maintainability because it lets you connect a requirement change to the potentially affected parts of the implementation.

Challenges

Computers are still unable to adequately process prose. Also, the developers who read the requirements will likely misunderstand some of them and implement the wrong functionality. You can address this problem by using controlled natural language, trying to be precise in writing prose. Or, you can use suitable machine-processable languages to express those requirements that can be formalized or structured. However, close integration between prose and such a diverse and growing set of DSLs is necessary because some parts of requirements will always be expressed as prose. Creating complete, consistent traces takes much work and requires discipline. It also requires tool support: it must be possible to attach a trace to arbitrary program elements, expressed in any implementation language. Most development tools don’t support this.

The Solution

mbeddr includes a requirements language that represents each requirement with a unique ID, summary, and prose description. It also allows embedding code expressed in 6 any language into requirements, with full IDE support for those languages. Requirements elicitation can start with prose; as the understanding grows, you can formalize some requirements with DSLs.

FIGURE 2. An example from a language that used a conditional assignment based on a programming-language-inspired notation. Users rejected it because they felt it applied general programming syntax to a specialized insurance problem.

For example, Figure 1a contains a table used for price calculation in a hypothetical telecommunications company. The table isn’t just formatting: countries and price groups are references to variables defined elsewhere. To further support integrating the prose and formal aspects, users can embed arbitrary program nodes in prose. The example in Figure 1a embeds variable definitions, which are automatically renamed during refactorings. It also embeds price calculation formulas as real, type-checked expressions. During implementation, Java or C code can directly reference variables, formulas, or the pricing table. Code generators translate these formal descriptions into executable code. 7

Figure 1b shows an mbeddr state machine. Some states have traces pointing to requirements. Users can attach traces to program elements expressed in any language. Also, because the traces are actual pointers to requirements, they can be followed in reverse: mbeddr supports reports showing which requirements are traced from which program elements.

Modularity and Projection

Modularity is crucial for this approach. The basic requirements language is generic and reusable. For particular domains, users can develop DSLs and plug them into the requirements language seamlessly. As the previous mbeddr C examples illustrate, even these DSLs can be extended further. For example, an expression language that you can embed in various kinds of business rules is a useful reusable asset.

Tracing also relies on language composition and projectional editing by using MPS’s annotations. These are special nodes users can attach to arbitrary program nodes, without the program nodes’ definitions being aware of that. This is useful for “metadata” that is used by specialized tools (for example, a trace analyzer) but doesn’t affect the core program’s semantics. Documentation, requirements traces, or specifications of architectural layers are examples of such metadata.

The requirements language also exploits the benefits of projectional editing. The ability to seamlessly mix unstructured prose with structured program nodes is extremely helpful for descriptions in which prose has played, and will probably always play, an important role. Requirements engineers and domain experts can start with prose and then enrich it with formal aspects such as references to domain entities, embedded formulas, or product- specific value assignments to variables in the 7 context of product lines. Users can de ne new “embeddable words” at any time.

The ability to use notations that aren’t typically associated with languages, such as tables or mathematical formulas, is extremely useful in requirements engineering, in which users typically aren’t programmers. Projectional editing is a good fit because it supports such notations.

Experience

When MPS got support for mixing prose and program nodes (through a plugin by Sascha Lisson), we realized its potential for requirements engineering as well. Several requirements elicitation projects have used the requirements language, which received positive user feedback.

Example: Insurance Rules

In the insurance industry, new products are defined regularly. The definition includes many, often complex, mathematical formulas to calculate values for premiums, annuities, reserves, or dividends. These formulas are important to an insurance company’s success. For example, if a premium is calculated too low, the company might lose money, whereas a premium that’s too high might hurt sales. Time to market is also important: companies must respond quickly to market opportunities or changes in the law by offering tailored insurance products. Challenges Typically, actuaries write and maintain the formulas, and they’re insurance math experts, not programmers. There are two common approaches to implementing insurance formulas.

In the first one, actuaries write the formulas in an informal language and give this specification to programmers to implement (for example, in Java). This approach has wellknown problems: it might introduce errors in communication between the involved people, and development is slow because of the manual, multistep process.

Alternatively, actuaries can write the formulas in a formal language with downstream code generators, without involving programmers. These formal languages are essentially simplified programming languages. To use them, actuaries must, to a degree, become programmers, making this approach a tough sell.

The challenge is to define a language that’s formal enough for analysis and code generation but doesn’t alienate or overwhelm actuaries. Figure 2 shows an early attempt at such a language that used a conditional assignment based on a programming- language-inspired notation. Users rejected it categorically because they felt it applied general programming syntax to a specialized insurance problem.

The Solution

Figure 3 shows the same conditional assignment with a notation using a column layout instead of keywords. The actuaries found this notation easy to grasp and accepted it without problems. They also needed a more easily understandable notation for complex formulas such as the one from the first when clause in Figure 2. Although a computer can easily parse this linear structure, it is difficult for humans. So, we used a nonlinear, mathematical notation similar to what actuaries use on paper. The first line in Figure 3 shows the same formula.

FIGURE 3. A domain-specific language for insurance that uses a column layout, placeholders, and mathematical notation. The graphical elements and placeholders helped with user acceptance of the language.

The notation makes the structure immediately clear to users.

Nonprogrammers want the computer to indicate what to do next. So, placeholders (such as <> in Figure 3) indicate where to enter data in the code. Users feel this is easier than an empty editor that requires them to figure out what’s allowed next, possibly through code completion. Partial projections, configured through global settings, let users hide parts or aspects of programs, helping them focus on the task at hand. One such task is debugging. Because expressions have no side effects, debugging doesn’t require stepping though the program. Instead, MPS can illustrate the computation just by showing all intermediate results (see Figure 4). Actuaries have found “test-driven rule development” and the ability to overlay the test data over the insurance rules very helpful.

Modularity and Projection

Projectional editing is essential for this system. Nonlinear notations such as sum symbols, fraction bars, column layouts, or tables aren’t practical with parsers. MPS can also use buttons, check boxes, or labels and text fields in an editor. This allows mixing language-like notations (expressions, statements, and math symbols) with UI elements known to the users from form-like applications, further lowering the adoption barrier.

The debug notation in Figure 4c also illustrates the power of projectional editing; the tree-like notation that shows the value of each subexpression is essentially an automatic side effect of the tree structure of expressions. MPS shows the debug information only when the debugger is activated. The information is read-only (computed by an interpreter) and updated automatically as the test data changes.

Experience

We’re still developing the insurance language. Actuaries have provided enthusiastic feedback about the notation and suggested many new ideas to extend the notation with more insurance-specific symbols and structures. This experience is in stark contrast to earlier attempts based on more classic programming-language notations and tools. We plan to implement more aspects of the overall insurance product development work ow for the customer, integrating the formulas we discussed. MPS’s support for language composition makes this possible with little effort.

FIGURE 4. Employing MPS (Meta Programming System) to design a language for specifying insurance rules. (a) An expression used in insurance rules that employs a decision table, a compact notation for nested if statements. (b) Two test cases. The first argument is prd; the second one is prs. (c) The same expression in debug mode. The intermediate result of every subexpression (computed by an interpreter specifically for a test case) is annotated over or to the left of the expression.

On the basis of our experience and feedback from users in domains as diverse as the ones we described here, we conclude that the support for wide-ranging modularity and nontextual notations provided by projectional editing has significant advantages over traditional languages. It enables the use of languages for tasks for which they weren’t previously feasible. However, the development of languages in PE also need evaluation. Our experience is that MPS is a productive LWB, as measured by the language implementations’ size and the effort spent. For example, we built the state machine extension to mbeddr C in roughly two person-months. Both MPS and mbeddr are open source; you can try them for yourself. Finally, it isn’t absolutely clear that you need PEs to build systems such as the ones we described. Perhaps parsing could be extended to support the necessary features. However, no such systems built with parser technology exist. This question also needs more systematic research.

References

  1. S. Erdweg, P.G. Giarrusso, and T. Rendel, “Language Composition Untangled,” Proc. 12th Workshop Language Descriptions, Tools, and Applications (LDTA 12), 2012, article 7.
  2. M. Simi and F. Campagne, “Composable Languages for Bioinformatics: The NYoSh Experiment,” PeerJ, 2 Jan. 2014.
  3. L.C.L. Kats, E. Visser, and G. Wachsmuth, “Pure and Declarative Syntax Definition: Paradise Lost and Regained,” Proc. 25th Ann. ACM SIGPLAN Conf. Object- Oriented Programming, Systems, Languages, and Applications (OOPSLA 10), 2010, pp. 918–932.
  4. M. Voelter et al., “mbeddr: Instantiating a Language Workbench in the Embedded Software Domain,” Automated Software Eng., vol. 20, no. 3, 2013, pp. 339–390.
  5. M. Voelter, “Preliminary Experience of Using mbeddr for Developing Embedded Software,” Proc. 10th Dagstuhl Workshop Model-Based Development of Embedded Systems, 2014.
  6. M. Voelter, D. Ratiu, and F. Tomassetti, “Requirements as First-Class Citizens: Integrating Requirements Closely with Implementation Artifacts,” Proc. 6th Int’l Workshop Model Based Architecting and Construction of Embedded Systems, 2013.
  7. M. Voelter, “Integrating Prose as FirstClass Citizens with Models and Code,” Proc. 7th Int’l Workshop Multi-paradigm Modeling (MPM 13), 2013, pp. 17–26.
  8. S. Erdweg et al., “The State of the Art in Language Workbenches,” Software Language Engineering, M. Erwig, R. Paige, and E. Wyk, eds., LNCS 8225, Springer, 2013, pp. 197–217.

About the Authors

Markus Voelter is an independent researcher and a consultant and coach for itemis. His research interests are language engineering, software architecture, and product lines. He’s currently engineering languages for technical and business domains, mostly with MPS (Meta Programming System). Voelter received a PhD in computer science from TU Delft. Contact him at voelter@acm.org; www.voelter.de.

 Jos Warmer is an independent consultant, architect, and coach on model-driven development. He focuses on designing domain-specific languages and building tools to support their use. Warmer received a master’s in mathematics and computer science from the University of Amsterdam. Contact him at jos.warmer@openmodeling.nl

Bernd Kolb is an architect and team lead at itemis. His research interests are domain-specific languages and model- driven software development. Kolb received a master’s in software engineering from Albstadt-Sigmaringen University. Contact him at bernd.kolb@gmail.com.

 

Language Workbenches

Martin Fowler introduced the term “language workbench” in 2004 1. However, such tools already existed in the ’80s and ’90s—for example, the Synthesizer Generator 2 and ASF + SDF Meta-environment3. Current examples include Rascal, 4 Spoofax, 5 and MPS (JetBrains Meta Programming System). (For more on MPS, see the main article.) “The State of the Art in Language Workbenches” provides an overview and comparison of today’s LWBs. 6 

  1. M. Fowler, “Language Workbenches: The Killer-App for Domain Specific Languages?,” 2005.
  2. T.W. Reps and T. Teitelbaum, “The Synthesizer Generator,” Proc. 1st ACM SIGSOFT/SIGPLAN Software Eng. Symp. Practical Software Development Environments, 1984, pp. 42–48.
  3. P. Klint, “A Meta-environment for Generating Programming Environments,” ACM Trans. Software Eng. and Methodology, vol. 2, no. 2, 1993, pp. 176–201.
  4. P. Klint, T. Van Der Storm, and J. Vinju, “Easy Meta-programming with Rascal,” Generative and Transformational Techniques in Software Engineering III, Springer, 2011, pp. 222–289.
  5. L.C. Kats and E. Visser, “The Spoofax Language Workbench: Rules for Declarative Specification of Languages and IDEs,” ACM Sigplan Notices, vol. 45, no. 10, 2010, pp. 444–463.
  6. S. Erdweg et al., “The State of the Art in Language Workbenches,” Software Language Engineering, M. Erwig, R. Paige, and E. Wyk, eds., LNCS 8225, Springer, 2013, pp. 197–217.

 

This article first appeared in IEEE Software magazine. IEEE Software offers solid, peer-reviewed information about today's strategic technology issues. To meet the challenges of running reliable, flexible enterprises, IT managers and technical leads rely on IT Pro for state-of-the-art solutions.

Rate this Article

Adoption
Style

BT