InfoQ

InfoQ

Article

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Writing A Textual DSL Using 'OSLO'

Posted by Dilip Krishnan on Dec 08, 2008

Sections
Development,
Architecture & Design
Topics
Domain Specific Languages ,
.NET
Tags
Oslo ,
Microsoft

Introduction

Microsoft unveiled the building blocks of their “OSLO” vision during the PDC event in Los Angeles. Oslo has three main components:

  • A modeling language M for textual DSLs
  • Quadrant a design surface for graphical DSLs
  • A relational database repository that stores these models.

The textual language development consists of is a three core languages, technically two that any given developer can author in:

  • MGrammar: defines grammars for Syntax Directed Translation.
  • MSchema: Is a language that defines schemas for a Semantic Model, that, model-aware runtimes can use.
  • MGraph: represents an object graph of a translation of a given textual input against a parser defined using MGrammar.

A Date DSL being the 'hello world' of all internal DSL’s, I figured it might be a good exercise to write a DSL for dates using Mg and write a parser for the generated MGraph that can translate a limited subset of arbitrary natural language input to a valid date. The article assumes basic familiarity with the MGrammar language.

The problem

The goal is to write a small DSL for expressing dates in natural language. It is NOT to write the best natural language DSL to parse dates but to explore the capabilities of the MGrammar language. Here is what it hopes to support at least in this incarnation.

  • “Today”, “tomorrow”, “day after tomorrow”
  • “Next Monday”, “next year”, “last month”, “last week”
  • “5 days later”, “1 year ago”, “a year later”, “5 months before”
  • “5 days from today”, “10 days before yesterday”
  • And of course it should accept a date literal as well!

Breaking down the problem

We need to break down a sentence; that contains a date expression; into parts of speech that we can then translate to an MGraph. This can then be interpreted as a set of functions/operations that result in an inferred date. The goal of the classification is to map a sentence in subject object predicate form to a mathematical notation of operators and operands that can be used to compute the implied date.

With that in mind we classify the Date expression as one of

  • Absolute Date Expression - Absolute date of course needs no explanation; these are absolute dates but for simplicity let us try and include only dates that are in the form ‘mm/dd/yyyy’ for e.g. 11/20/2008
  • Date Primitive Expressions - Date primitives are what I consider words or combination of words that are sort of constants but not really absolute dates for e.g. today, tomorrow, yesterday
  • Unary Date Expression - One that involves an operator and only one operand.
  • Binary Date Expression – One that involves an operator and two operands

Let us examine the unary and binary expressions further

Unary Date Expressions

Binary Date Expressions

Operators in postfix form

5 days later”,
1 year ago”,
a year later”,
5 months before

5 days from today”,
10 days before yesterday

Operators in prefix form

next Monday”,
next year”,
last month”,
last week

 

The above table takes examples of certain sentences one might use when trying to perhaps fill out a value in a date field. Hopefully it is evident why the sentences fall into one of those categories, the sentences are also color coded to highlight the data that is of interest from these sentences. The yellow highlights are what I consider operators (in Blue). They apply to an operand (in Red). An operand represents a date relative to which the operator applies a certain number of units (in Black). The unary expressions have an operand too, only that it is implied. When there is no operand the operation is assumed to be relative to today. The unary operators come in two flavors, operations where the sentence appears like a postfix operation and one where a prefix representation seems more natural.

In an effort to formalize these ideas let us define some core concepts

  • Date Primitive – One can think of this as a basis. For the most part the one date primitive we end up using is the concept of ‘Today’. Everything in this Date DSL is relative to this one point in time.
  • Absolute Date – This is any date literal and for simplicity sake we only need to worry about dates in the format of mm/dd/yyyy
  • Calculated Date – This is any date that is calculated relative to a basis. It consists of an operand, an operator; which could be add or subtract; and a units/relative units component which represents the ‘how much’
    • Operand - In a calculated expression, the operand forms the basis. The basis could one of absolute date, primitive date; calculated dates are not allowed as a basis with the exception of “yesterday” and “tomorrow”
    • Units – This is the absolute number of days, weeks, months or years. Units can be associated with a unit of measure we represent units in terms of NumberOf Days, Weeks, Months or Years. Where the value of days, weeks etc. represent an absolute quantity.
    • Relative Units – We use relative units to represent date expressions that are relative number of days to or from a particular day of the week or month from ‘today’ for e.g. next Monday, or last January. Relative units are represented in terms of NumberOf and DaysToOrFrom or MonthsToOrFrom. So “next Monday” would translate to NumberOf DaysToOrFrom “Monday”.
    • Operator - This can be either Add (+) or Subtract (-)

Much of the effort in formalizing this is so that the translation from natural language also reads well (for an application that is aware of our DSL) when it is transformed into an MGraph. Let us take a few examples of how we would like to consume the parser output and try and create a language from there. Aside from the fact that the below illustrations are in fact MGraph that is a result of translations of a given input against our parse (using the Intellipad tool), it just represents the end state we are working towards.

Main[
  DatePrimitive[
    "today"
  ]
] 
Here's what we’d like to see when the input is “today”. Since today represents a point in time that acts as the basis we treat this value as a primitive. “Today” has a special meaning and hence we treat this a primitive
Main[
  AbsoluteDate[
    Month[
      "11"
    ],
    "/",
    Day[
      "01"
    ],
    "/",
    Year[
      "2008"
    ]
  ]
] 

Given our assumption that dates are always in mm/dd/yyyy, to represent an absolute date we create a graph that represents an absolute date that has Day, Month and Year properties the above graph represents an output input is 11/01/2008

Main[
  CalculatedDate[
    NumberOf[
      DaysToOrFrom[
        "Monday"
      ]
    ],
    Operator[
      "-"
    ],
    DatePrimitive[
      "today"
    ]
  ]
] 

When input is “last Monday”, we render the graph as a calculated date that subtracts number of days from today by an amount so as to get the previous Monday before “today”.

Main[
  CalculatedDate[
    NumberOf[
      Days[
        1
      ]
    ],
    Operator[
      "+"
    ],
    DatePrimitive[
      "today"
    ]
  ]
] 

Everything else is represented as a Calculated date with Number Of Days, Weeks, Months, Years relative to “today”. The above graph represents the output when input is “tomorrow”

Solution

Without going into details of the MGrammar language specification, and assuming basic tokens definitions in our mini-language, the top level grammar for our language is excerpted below.

module MyDSL
{
    @{CaseSensitive[false]}
    language Dates
    {
        syntax Main = d:DateExpression;
        syntax DateExpression =  a:AbsoluteOrPrimitiveDateExpression => a|
                                u:UnaryDateExpression => u|
                                b:BinaryDateExpression => b;
        syntax BinaryDateExpression = q:UnitOrUnits o:Operator a:AbsoluteOrPrimitiveDateExpression
                                        => CalculatedDate[NumberOf[valuesof(q)],
                                                          Operator[o],
                                                          valuesof(a)];
        syntax UnaryDateExpression =
                                o:PrefixUnaryOperator p:Unit
                                    => CalculatedDate[NumberOf[p],
                                                      o,
                                                      DatePrimitive["today"]] |
                                q:UnitOrUnits o:PostfixUnaryOperator
                                    => CalculatedDate[NumberOf[valuesof(q)],
                                                      o,
                                                      DatePrimitive["today"]];
        syntax AbsoluteOrPrimitiveDateExpression = a:AbsoluteDate => a |
                                                   d:DatePrimitive => d;
        syntax DatePrimitive = Yesterday =>  CalculatedDate[NumberOf[ Days[1]],
                                                                      Operator["-"],
                                                                      DatePrimitive["today"]] |
                               Today => DatePrimitive["today"] |
                               Tomorrow =>  CalculatedDate[NumberOf[ Days[1]],
                                                            Operator["+"],
                                                            DatePrimitive["today"]];
        syntax AbsoluteDate = MM "/" DD "/" YY;
        syntax PrefixUnaryOperator = "Next" => Operator["+"] |
                                ("Previous" | "Last") => Operator["-"];
        syntax PostfixUnaryOperator = ("Later" | "After") => Operator["+"] |
                                ("Before"|"Ago"|"Back") => Operator["-"];

        syntax Operator = a:AfterToken => "+" |
                          b:BeforeToken => "-";
        syntax UnitOrUnits = Unit | Units;
        syntax Unit = OneToken? Day => Days[1] |
                        OneToken? Year => Year[1] |
                        OneToken? Month => Month[1] |
                        OneToken? Week => Week[1] |
                        d:DayOfTheWeek => DaysToOrFrom[d] |
                        d:MonthsOfTheYear => MonthsToOrFrom[d];
        syntax Units = n:NumberOf Days => Days[n] |
                             n:NumberOf Months => Months[n] |
                             n:NumberOf Years => Years[n] |
                             n:NumberOf Weeks => Weeks[n];

        ...

        //Excluded for brevity

        ...

        interleave Whitespace = " " | "\r" | "\n";
    }
}

Having defined the grammar it is pretty trivial to consume the generated graph. We use the Mg tool to generate an Mgx file that is (analogous to) ‘compiling’ the grammar. This generates a parser for us. We load up this generated parser in our application which we can use to parse any input. The parsed input generates an object graph of the data we’re interested in. The developer workflow using the OSLO SDK tool chain is as follows.

In Step 1 of the workflow we create an image file of the MGrammar definition that we created using Mg.exe, the grammar compiler. The image file is nothing but a Parser serialized to a Xaml form and packaged in Open Packaging Convention OPC format. This allows the dynamic parser to be loaded at runtime using Xaml Services.

In Step 2 of the workflow the MGrammar compiler loads up the image file (1) and creates a parser (2) at runtime. This parser takes any textual input (3) and transforms that into an MGraph (4).

Traversing the generated MGraph is done using a concrete implementation of IGraphBuilder, a fairly mundane task; however a task that I find not very easy to generalize. However with some clever XAML trickery that shouldn’t be tough to do either.

Conclusion

Mg is a wonderfully simple meta-language. It does have some limitations, mostly because of lack of extensive documentation. Some of the things I noticed were

  • I wasn’t able to figure out the difference between an id(“SomeNode”) node and a statically defined node called SomeNode {…}. The Id function (?) is used when the label of a node needs to by dynamic but I wasn’t able to glean some of the usage idioms/conventions associated with it, when we’re using it for literals.
  • The language itself has support for expressing the graph in terms of primitive .net types, however I couldn’t find how to project date literals.
  • There was no way to manipulate the projections similar to the id function (?) for e.g. If I wanted a rule, that normalizes the year from an “yy” format to “yyyy” format i.e. If the input is ‘08’ then I could not figure out how to tell the generator to prefix the year with ‘20’ in the projection and transform it to ‘2008’.

OSLO is still a maturing platform (still in CTP) and I’m sure by the time it is released Microsoft will have ironed out some of the kinks. OSLO really opens up the possibilities of true domain-driven-design, very simply, by allowing different domain representations to be transformed into a standard structured data form. This affords different model-aware/assisted systems domains interaction with each other in their own dialects of the DSL, including visual dialects.In the context of this example, one can imagine having a grammar for date expressions written in Chinese perhaps, that produces the same structured representation of the date expression, that can be used by model driven applications downstream.

References

Oslo Developer Center

Introducing "Oslo" - Service Station, by Aaron Skonnard .

OSLO: Building Textual DSLs.

Martin Fowler on OSLO

First in-depth look at Microsoft’s Oslo and the “M” modeling language

Oslo and DDD by Colin Jack Posted
Re: Oslo and DDD by Dilip Krishnan Posted
  1. Back to top

    Oslo and DDD

    by Colin Jack

    "OSLO really opens up the possibilities of true domain-driven-design, very simply, by allowing different domain representations to be transformed into a standard structured data form. This affords different model-aware/assisted systems domains interaction with each other in their own dialects of the DSL, including visual dialects"

    Do you know of any examples because from what I've seen this approach is a bit of a red herring. Not saying its not possible but are you saying completely do away with a normal object-oriented domain model and do everything in Oslo?

  2. Back to top

    Re: Oslo and DDD

    by Dilip Krishnan

    IMHO, What OSLO is trying to do is get us closer to that vision. I don't know of any examples of software that are completely model-driven (in the classic sense) and I'm guessing it'll be a long time before we do see them in the wild. But I do know a lot of the software we write is certainly model aware/assisted. Its a very subtle difference between the two. To illustrate with an example, consider a configuration file that an application uses to load up the the required settings/environment; this can simply be thought of as a model, specifically a domain model for the application settings/environment.

    Now there is a plethora of such tiny domain models scattered across the applications thats consumed by different pieces in the application. At the most basic level OSLO tries to solve the problem of configuration at a platform level (model assisted). Now, when we take this notion to the next level where the application runtime is aware of these models as a whole, you get a different class of applications that is model aware. This is not to say that no object-oriented development is required, as one still needs to create the application runtime. OSLO gives a standard way to define these models and store/retrieve them.

Educational Content

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.

Questions for an Enterprise Architect

Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?

Wrap Your SQL Head Around Riak MapReduce

Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.

Polyglot Persistence for Java Developers - Moving Out of the Relational Comfort Zone

Chris Richardson shows how he ported a relational database to three NoSQL data stores: Redis, Cassandra and MongoDB.

The Golden Circle – Why How What

Jean Tabaka challenges the audience to reflect on what Agile practices they are employing, how they are using them, ending with the questions “Why have their organization chosen to go Agile?