BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles How DSLs Withstand the Test of Time

How DSLs Withstand the Test of Time

Bookmarks

Key Takeaways

  • Domain-specific languages allow non-developer domain experts to contribute to the software development process without external, costly mediation.
  • In order to succeed, DSLs however must address the needs of their target audience better than existing alternatives.
  • DSLs that are closed for modification but open for extension are more likely to remain relevant, even as the target audience grows and requirements evolve.
  • Markdown, Tex, and CSS are examples of languages that remained relevant even two decades after their origination.
  • The initial focus on specific needs of a small target segment, together with intentional or accidental extension mechanisms seem to be core factors behind DSL longevity.

Few DSLs have been successful and widely used over decades, even though they have seen infrequent updates over the years. Markdown, TeX, and CSS are mature languages that have existed for more than 16 years. In this article, we examine how the key design decisions that underlied the development of each language resulted in their continued relevance across new use cases that appeared over more than a decade.

Markdown

Matthew Guay, founding editor of the capiche knowledge repository, recently published an essay that recalls the story of Markdown. Guay wrote:

"Email was plain ASCII," recalled Low End Mac’s Dan Knight in a 2000 article on email formatting. "No bold face. No italic. No fonts. No color. Just text."

So early email senders got creative. They borrowed from the world of typewriting, with two dashes for an em dash, lines of asterisks to indicate line breaks, and underlined text for headers.

They added comic strip-style asterisks and underscores around words to add emphasis.

Lists came with both dashes and asterisks; quotes and links came with new tech-influenced greater-than symbols and square brackets.

Markdown creator John Gruber would formalize the last iteration of the Markdown language in 2004, after a few years of experimentation. He explained the objectives of the language:

In fact, I love writing email. Email is my favorite writing medium. I’ve sent over 16,000 emails in the last five years. The conventions of plain text email allow me to express myself clearly and precisely, without ever getting in my way.

Thus, Markdown. Email-style writing for the web.

[…] It’s aimed at a sweet spot, between making it easy to use real HTML when you need it, and letting you just write plain text for anything where it’s sensible and obvious to do so.

[…] When you write and read text that’s marked-up with HTML tags, it’s forcing you to concentrate on the think of it. It’s the feel of it that I want Markdown-formatted text to convey.

Markdown coexisted with similar formats, including atx, textile, or reStructuredText. Markdown did not try to replicate the expressivity of HTML and focused on the user experience of its target audience (web bloggers) instead of that of advanced users. Advanced users could however drop down to HTML when required. Popular companies such as WordPress and Basecamp started supporting Markdown, and were followed by bloggers, drawn in by the seemingly natural syntax.

Markdown is finished as a language since 2004. However, numerous extensions have sought to add features to support the needs of a larger and more diverse user base seeking to publish content on the web. Those features include tables, footnotes, cross-references, mathematic formulas, and more.

Markdown may have prevailed in the short-term vs. alternatives by providing an optimal user experience (when both reading and producing content) for a non-technical target segment (web bloggers) — sometimes at the expense of the expressivity and simplicity of its grammar. Formatting markers most often start with short, infrequently-used symbols (e.g., #, backquotes, triple backquotes, brackets and parentheses) which arguably allows a reader to discriminate the readable content from formatting instructions faster. By comparison, the competing textile format which shares many of the same goals as Markdown has a lower ratio content/formatting, even though it offers more core features:

h2. Textile
Textile integrations are available for "a wide range of platforms":/article/.

pre. Pre-formatted       text

bc. 10 PRINT "I ROCK AT BASIC!"
20 GOTO 10 // Block code!

bq.. Beginning of the quotation.

Another paragraph, also part of the quote.

p. Normal text continues here.

p<. Left aligned paragraph.

p{color:red;letter-spacing:0.25em}. This is red spaced text.

Global warming in the 21[^st^] century is mostly caused by CO[~2~] in the atmosphere.

###.. ////////////////////////////
// This multi-line comment
// will keep going and going

// even if you put blank lines in.
//////////////////////////////////

In the long term, however, it may be the lack of prescriptive syntax and grammar that allowed a large number of extensions to the format. Because needs from non-technical content authors are also needs from technical users, new usage could be implemented by extending rather than modifying the Markdown core. To quote the open/closed principle, Markdown is designed to be closed for modification, but remains open for extension.

TeX

TeX is a typesetting system that produces beautifully formatted complex mathematical formulae. Created by mathematician and professor emeritus Donald Knuth to typeset the volumes of his The Art of Computer Programming books, it is reputed as one of the most sophisticated digital typographical systems. Knuth explained the genesis of TeX as follows:

In February 1977 I saw for the first time the output of a high-quality digital typesetter, which had more than 1,000 dots per inch… and it looked perfect, every bit as good as the best metal typography I had ever seen […]

In other words, the problem of printing beautiful books had changed from a problem of metallurgy to a problem of optics to a problem of computer science. […]

With the METAFONT system for type design, and the TEX system for putting letters and symbols into the right positions on a page, anybody who wants to write a beautiful book can now do so singlehandedly with a reasonable amount of effort.

TeX had clear attractive features. It was free, text-based, portable, and consistently produced the same output across platforms. It was empowering authors (and their secretaries) to directly and accurately decide the typesetting of their content. Existing solutions available for mathematical typesetting had none or few of these properties. The TeX User Group explained the reasons behind the success of TeX:

From the start, it has been popular among mathematicians, physicists, astrophysicists, astronomers, any research scientists who were plagued by lack of the necessary symbols on typewriters and who wanted a more professional look to their preprints.

To produce his own books, Knuth had to tackle all the paraphernalia of academic publishing - footnotes, floating insertions (figures and tables), etc., etc. As a mathematician/computer scientist, he developed an input language that makes sense to other scientists, and for math expressions, is quite similar to how one mathematician would recite a string of notation to another on the telephone.

The TeX specifications have been frozen in 1990, with newer versions only incorporating bug fixes. TeX includes a real, Turing-complete programming language that includes macros. TeX macros have been used to extend the language.

The most popular general-purpose extensions are LaTeX and ConTeXt, next to highly-specific extensions (e.g., MusiXTeX for typesetting complex music sheets). LaTeX, which provides a simpler interface to the TeX language interface, extends the TeX audience beyond hard-core mathematician circles. LaTeX is widely used in academia for the communication and publication of scientific documents in many fields beyond mathematics, including computer science, engineering, physics, economics, linguistics, quantitative psychology, philosophy, and political science. It also has a prominent role in the preparation and publication of books and articles that contain complex multilingual materials, such as Sanskrit and Greek.

The following TeX markup:

The quadratic formula is $$-b \pm  \sqrt{b^2  -  4ac} \over  2a$$

produces the following output:

The quadratic formula is

The short-term success of TeX may be simply due to its unmatched output, and the abysmal constraints linked to the alternatives. TeX was designed for a fairly proficient target of mathematicians. As such it could afford to be complex and embed a Turing-complete language that enables complete parameterization of a document rendering.

The Turing-completeness and macro system of TeX may be the most important factor behind the longevity of TeX as it extended to new segment of users and addressed unforeseen use cases. While it took another mathematician — Leslie Lamport, recipient of the Turing Award, to create LaTeX, it was possible to do so with TeX macros and without modifying the original TeX code or specifications.


(example of thesis produced by LaTeX)

CSS

Work on CSS started in 1994. As another famous web DSL, HTML, was growing to meet the needs of web developers outside of scientific environments, it increasingly incorporated presentational tags to specify a web site appearance (e.g., <center/> to center text in Netscape, one of the first web browsers). Håkon Wium Lie explained in his thesis the issue with the evolution of HTML from a structural DSL toward a presentational DSL:

Determining the right abstraction level is an important part of designing a document format. If the abstraction level is high, both the authoring process and the task of formatting the document become more complex. The author must relate to non-visible abstract concepts. The benefit of a high abstraction level is that the content can be reused in many contexts. For example, a headline can be presented in large letters on printed sheets, and with a louder voice in a text-to-speech system.
[…]
Conversely, a low level of abstraction will make the authoring and formatting process easier (up to a point). Authors can use visually oriented WYSIWYG (What You See Is What You Get) tools, and the browser does not have to perform extensive transformations before presenting the document. The drawback of using presentation-oriented document formats is that the content is not easily reusable in other contexts. For example, it can be difficult to make presentation-oriented documents available on a device with a different screen size, or to a visually impaired person.
[…]
The introduction of presentational tags in HTML was a downwards move on the ladder of abstraction. Several of the new elements (e.g., BLINK) were meaningful only for particular output devices (how is blinking text displayed in a text-to-speech system?). The creators of HTML intended it to be usable in many settings but presentational tags threatened device independence, accessibility and content reuse.

CSS was designed with the following design characteristics:

  • usable by non-programmers
  • not a Turing-complete declarative language
  • supports progressive rendering, in order to display content to the user as quickly as possible
  • should work with any structured markup language, including HTML

This is to be contrasted with the competing Turing-complete DSSSL, which used a Lisp-like pure functional language with abundance of parenthesis:

(define (create-heading heading-font-size)
  (make paragraph
        font-size: heading-font-size
        font-weight: 'bold))

(element h1 (create-heading 24pt))
(element h2 (create-heading 18pt))

Another style-sheet format, XML-based XSL Formatting Objects (XSL-FO) allowed its users to encode content into the XML language of their choice (e.g., DocBook), and provide an XLST transform that would turn that content into an XSL-FO styled document. Some authors however found that the produced stylesheets may be impractical to edit manually. Additionally, XSL-FO content that results from a XSLT transformation generally required the pre-transform content to be entirely available. Progressive rendering, i.e. the ability to stream partial content, is thus often not an option.

Lie explained further that CSS distinguished itself from competing formats by its extensibility and customizability:

  • Forward-compatible parsing rules allow the language to be extended.
  • Style sheets could be customized to certain output devices by means of media types.
  • Style sheets could be customized by users through the concept of cascading, thus overriding selected parts of an author design
  • Style could be customized according to information not available in the document structure, by means of pseudo-classes and pseudo-elements (e.g., :hover, :visited , ::first-line).

CSS was not an immediate success. As a matter of fact, the Web Standards Project organization believes that CSS did not become truly usable until 2001 — 5 years after becoming a W3C recommendation, when Internet Explorer 6.0 shipped with incomplete, quirky but good-enough support for it.

Since its creation (CSS Level 1), CSS have been continuously extended while largely maintaining backwards compatibility.

CSS’ short-term success vs. the alternatives may lie in that it more adequately satisfied the requirements of the web (progressive rendering, robustness, ability to combine multiple style sheets, media-specific style sheets, and more). In particular, CSS was not designed with a specific browser technology in mind. While some CSS features (contextual selectors) may have delayed the adoption of CSS across browsers, CSS was designed at a high-enough level of abstraction that it could address the styling needs of miscellaneous media.

Additionally, its forward-compatible parsing rules made it possible to extend CSS without major breakage to adjust to unforeseen use cases. CSS has been used to specify the styling of native mobile applications and PDF documents. The EPUB and MOBI eReader formats use CSS (and HTML) under the hood. The chameleonic aptitude of CSS and its extensibility are probably the main factors behind its longevity.

Conclusions

In conclusion, for all three of the examined languages, short-term success resulted from providing to a small segment a better user experience or superior output than the alternatives. Markdown arguably fitted the mental model of bloggers better and felt more natural than its competitors. Bloggers could focus on content — the value that they were adding and their raison d’être, rather than presentational mark-ups or a complex syntax. TeX let mathematicians take control of the typesetting of their books away from publishers, while maintaining unmatched typographic quality. The shortened feedback loop between raw content and typeset content empowered mathematicians to spend more time on communicating their ideas in a structured, consistent and appealing way to their audience — the core value of books. CSS fitted the requirements of the web ecosystem better (e.g., web platform, web users, web designers).

However, the fact that those languages have not seen modifications in their core over nearly two decades is a testament to their ability to be reused and extended without being modified. That, in turn, is a result of their design.

Markdown followed a user-centered design that favored the expressivity of the language over the consistency of the syntax and simplicity of its grammar. One PEG grammar that attempts to implement the Markdown specifications is, as a matter of fact, 700 lines long. However, not having a grammar allowed Markdown to be easily extended with greater syntactic freedom. This enabled picking up new syntax that favored user experience over ease or consistency of implementation.

TeX, thanks to its being Turing-complete and its embedded macro language, could be extended into LaTeX, which provides a more user-friendly interface that can be used by a non-scientific audience. Turing-completeness has its drawbacks and it took another award-winning mathematician to write LaTeX. While LaTeX or TeX package authors need programming skills, LaTeX users can write their documents declaratively with a richer arsenal of macros provided by external packages, and without any change in the quality of the original TeX typesetting.

CSS was, from the get-go, conceived at a level of abstraction that made it extensible both in syntax and in scope. Interestingly, this may have made it harder to implement by the existing browsers and delayed its success. This however ensured its continued relevance two decades after its first iteration.

Declarative domain-specific languages allow domain experts to efficiently describe the parts of a problem space they are interested in, independently of the solution space. In a previous article for InfoQ, Markus Voelter explained in length the benefits of DSLs:

Domain-specific languages allow non-developer domain experts to contribute directly to the software development process. […] From a software architect’s perspective, DSLs allow the complete separation of the business logic from the implementation technology, thereby avoiding the trap of legacy systems. […] In contrast to popular belief, the approach works in practice [and there is] a collection of real-world stories that illustrate how companies benefit from the approach.

To succeed, DSLs should focus on their specific domain as much as on the user of the language. To withstand the test of the time, DSLs should be designed to be extensible with no, or little modification.

About the Author

Bruno Couriol holds a Msc in Telecommunications, a BsC in Mathematics and a MBA by INSEAD. Starting with Accenture, most of his career has been spent as a consultant, helping large companies addressing their critical strategical, organizational and technical issues. In the last few years, he developed a focus on the intersection of business, technology and entrepreneurship.

Rate this Article

Adoption
Style

BT