BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Writing A Textual DSL Using 'OSLO'

Writing A Textual DSL Using 'OSLO'

This item in japanese

Introduction

Microsoft unveiled the building blocks of their “OSLO” vision during the PDC event in Los Angeles. Oslo has three main components:

  • A modeling language M for textual DSLs
  • Quadrant a design surface for graphical DSLs
  • A relational database repository that stores these models.

The textual language development consists of is a three core languages, technically two that any given developer can author in:

  • MGrammar: defines grammars for Syntax Directed Translation.
  • MSchema: Is a language that defines schemas for a Semantic Model, that, model-aware runtimes can use.
  • MGraph: represents an object graph of a translation of a given textual input against a parser defined using MGrammar.

A Date DSL being the 'hello world' of all internal DSL’s, I figured it might be a good exercise to write a DSL for dates using Mg and write a parser for the generated MGraph that can translate a limited subset of arbitrary natural language input to a valid date. The article assumes basic familiarity with the MGrammar language.

The problem

The goal is to write a small DSL for expressing dates in natural language. It is NOT to write the best natural language DSL to parse dates but to explore the capabilities of the MGrammar language. Here is what it hopes to support at least in this incarnation.

  • “Today”, “tomorrow”, “day after tomorrow”
  • “Next Monday”, “next year”, “last month”, “last week”
  • “5 days later”, “1 year ago”, “a year later”, “5 months before”
  • “5 days from today”, “10 days before yesterday”
  • And of course it should accept a date literal as well!

Breaking down the problem

We need to break down a sentence; that contains a date expression; into parts of speech that we can then translate to an MGraph. This can then be interpreted as a set of functions/operations that result in an inferred date. The goal of the classification is to map a sentence in subject object predicate form to a mathematical notation of operators and operands that can be used to compute the implied date.

With that in mind we classify the Date expression as one of

  • Absolute Date Expression - Absolute date of course needs no explanation; these are absolute dates but for simplicity let us try and include only dates that are in the form ‘mm/dd/yyyy’ for e.g. 11/20/2008
  • Date Primitive Expressions - Date primitives are what I consider words or combination of words that are sort of constants but not really absolute dates for e.g. today, tomorrow, yesterday
  • Unary Date Expression - One that involves an operator and only one operand.
  • Binary Date Expression – One that involves an operator and two operands

Let us examine the unary and binary expressions further

Unary Date Expressions

Binary Date Expressions

Operators in postfix form

5 days later”,
1 year ago”,
a year later”,
5 months before

5 days from today”,
10 days before yesterday

Operators in prefix form

next Monday”,
next year”,
last month”,
last week

 

The above table takes examples of certain sentences one might use when trying to perhaps fill out a value in a date field. Hopefully it is evident why the sentences fall into one of those categories, the sentences are also color coded to highlight the data that is of interest from these sentences. The yellow highlights are what I consider operators (in Blue). They apply to an operand (in Red). An operand represents a date relative to which the operator applies a certain number of units (in Black). The unary expressions have an operand too, only that it is implied. When there is no operand the operation is assumed to be relative to today. The unary operators come in two flavors, operations where the sentence appears like a postfix operation and one where a prefix representation seems more natural.

In an effort to formalize these ideas let us define some core concepts

  • Date Primitive – One can think of this as a basis. For the most part the one date primitive we end up using is the concept of ‘Today’. Everything in this Date DSL is relative to this one point in time.
  • Absolute Date – This is any date literal and for simplicity sake we only need to worry about dates in the format of mm/dd/yyyy
  • Calculated Date – This is any date that is calculated relative to a basis. It consists of an operand, an operator; which could be add or subtract; and a units/relative units component which represents the ‘how much’
    • Operand - In a calculated expression, the operand forms the basis. The basis could one of absolute date, primitive date; calculated dates are not allowed as a basis with the exception of “yesterday” and “tomorrow”
    • Units – This is the absolute number of days, weeks, months or years. Units can be associated with a unit of measure we represent units in terms of NumberOf Days, Weeks, Months or Years. Where the value of days, weeks etc. represent an absolute quantity.
    • Relative Units – We use relative units to represent date expressions that are relative number of days to or from a particular day of the week or month from ‘today’ for e.g. next Monday, or last January. Relative units are represented in terms of NumberOf and DaysToOrFrom or MonthsToOrFrom. So “next Monday” would translate to NumberOf DaysToOrFrom “Monday”.
    • Operator - This can be either Add (+) or Subtract (-)

Much of the effort in formalizing this is so that the translation from natural language also reads well (for an application that is aware of our DSL) when it is transformed into an MGraph. Let us take a few examples of how we would like to consume the parser output and try and create a language from there. Aside from the fact that the below illustrations are in fact MGraph that is a result of translations of a given input against our parse (using the Intellipad tool), it just represents the end state we are working towards.

Main[
  DatePrimitive[
    "today"
  ]
] 
Here's what we’d like to see when the input is “today”. Since today represents a point in time that acts as the basis we treat this value as a primitive. “Today” has a special meaning and hence we treat this a primitive
Main[
  AbsoluteDate[
    Month[
      "11"
    ],
    "/",
    Day[
      "01"
    ],
    "/",
    Year[
      "2008"
    ]
  ]
] 

Given our assumption that dates are always in mm/dd/yyyy, to represent an absolute date we create a graph that represents an absolute date that has Day, Month and Year properties the above graph represents an output input is 11/01/2008

Main[
  CalculatedDate[
    NumberOf[
      DaysToOrFrom[
        "Monday"
      ]
    ],
    Operator[
      "-"
    ],
    DatePrimitive[
      "today"
    ]
  ]
] 

When input is “last Monday”, we render the graph as a calculated date that subtracts number of days from today by an amount so as to get the previous Monday before “today”.

Main[
  CalculatedDate[
    NumberOf[
      Days[
        1
      ]
    ],
    Operator[
      "+"
    ],
    DatePrimitive[
      "today"
    ]
  ]
] 

Everything else is represented as a Calculated date with Number Of Days, Weeks, Months, Years relative to “today”. The above graph represents the output when input is “tomorrow”

Solution

Without going into details of the MGrammar language specification, and assuming basic tokens definitions in our mini-language, the top level grammar for our language is excerpted below.

module MyDSL
{
    @{CaseSensitive[false]}
    language Dates
    {
        syntax Main = d:DateExpression;
        syntax DateExpression =  a:AbsoluteOrPrimitiveDateExpression => a|
                                u:UnaryDateExpression => u|
                                b:BinaryDateExpression => b;
        syntax BinaryDateExpression = q:UnitOrUnits o:Operator a:AbsoluteOrPrimitiveDateExpression
                                        => CalculatedDate[NumberOf[valuesof(q)],
                                                          Operator[o],
                                                          valuesof(a)];
        syntax UnaryDateExpression =
                                o:PrefixUnaryOperator p:Unit
                                    => CalculatedDate[NumberOf[p],
                                                      o,
                                                      DatePrimitive["today"]] |
                                q:UnitOrUnits o:PostfixUnaryOperator
                                    => CalculatedDate[NumberOf[valuesof(q)],
                                                      o,
                                                      DatePrimitive["today"]];
        syntax AbsoluteOrPrimitiveDateExpression = a:AbsoluteDate => a |
                                                   d:DatePrimitive => d;
        syntax DatePrimitive = Yesterday =>  CalculatedDate[NumberOf[ Days[1]],
                                                                      Operator["-"],
                                                                      DatePrimitive["today"]] |
                               Today => DatePrimitive["today"] |
                               Tomorrow =>  CalculatedDate[NumberOf[ Days[1]],
                                                            Operator["+"],
                                                            DatePrimitive["today"]];
        syntax AbsoluteDate = MM "/" DD "/" YY;
        syntax PrefixUnaryOperator = "Next" => Operator["+"] |
                                ("Previous" | "Last") => Operator["-"];
        syntax PostfixUnaryOperator = ("Later" | "After") => Operator["+"] |
                                ("Before"|"Ago"|"Back") => Operator["-"];

        syntax Operator = a:AfterToken => "+" |
                          b:BeforeToken => "-";
        syntax UnitOrUnits = Unit | Units;
        syntax Unit = OneToken? Day => Days[1] |
                        OneToken? Year => Year[1] |
                        OneToken? Month => Month[1] |
                        OneToken? Week => Week[1] |
                        d:DayOfTheWeek => DaysToOrFrom[d] |
                        d:MonthsOfTheYear => MonthsToOrFrom[d];
        syntax Units = n:NumberOf Days => Days[n] |
                             n:NumberOf Months => Months[n] |
                             n:NumberOf Years => Years[n] |
                             n:NumberOf Weeks => Weeks[n];

        ...

        //Excluded for brevity

        ...

        interleave Whitespace = " " | "\r" | "\n";
    }
}

Having defined the grammar it is pretty trivial to consume the generated graph. We use the Mg tool to generate an Mgx file that is (analogous to) ‘compiling’ the grammar. This generates a parser for us. We load up this generated parser in our application which we can use to parse any input. The parsed input generates an object graph of the data we’re interested in. The developer workflow using the OSLO SDK tool chain is as follows.

In Step 1 of the workflow we create an image file of the MGrammar definition that we created using Mg.exe, the grammar compiler. The image file is nothing but a Parser serialized to a Xaml form and packaged in Open Packaging Convention OPC format. This allows the dynamic parser to be loaded at runtime using Xaml Services.

In Step 2 of the workflow the MGrammar compiler loads up the image file (1) and creates a parser (2) at runtime. This parser takes any textual input (3) and transforms that into an MGraph (4).

Traversing the generated MGraph is done using a concrete implementation of IGraphBuilder, a fairly mundane task; however a task that I find not very easy to generalize. However with some clever XAML trickery that shouldn’t be tough to do either.

Conclusion

Mg is a wonderfully simple meta-language. It does have some limitations, mostly because of lack of extensive documentation. Some of the things I noticed were

  • I wasn’t able to figure out the difference between an id(“SomeNode”) node and a statically defined node called SomeNode {…}. The Id function (?) is used when the label of a node needs to by dynamic but I wasn’t able to glean some of the usage idioms/conventions associated with it, when we’re using it for literals.
  • The language itself has support for expressing the graph in terms of primitive .net types, however I couldn’t find how to project date literals.
  • There was no way to manipulate the projections similar to the id function (?) for e.g. If I wanted a rule, that normalizes the year from an “yy” format to “yyyy” format i.e. If the input is ‘08’ then I could not figure out how to tell the generator to prefix the year with ‘20’ in the projection and transform it to ‘2008’.

OSLO is still a maturing platform (still in CTP) and I’m sure by the time it is released Microsoft will have ironed out some of the kinks. OSLO really opens up the possibilities of true domain-driven-design, very simply, by allowing different domain representations to be transformed into a standard structured data form. This affords different model-aware/assisted systems domains interaction with each other in their own dialects of the DSL, including visual dialects.In the context of this example, one can imagine having a grammar for date expressions written in Chinese perhaps, that produces the same structured representation of the date expression, that can be used by model driven applications downstream.

References

Oslo Developer Center

Introducing "Oslo" - Service Station, by Aaron Skonnard .

OSLO: Building Textual DSLs.

Martin Fowler on OSLO

First in-depth look at Microsoft’s Oslo and the “M” modeling language

Rate this Article

Adoption
Style

BT