InfoQ Homepage Articles Expression as a Compiler

Expression as a Compiler

Aug 10, 2009 13 min read

InfoQ Article Contest

Share your knowledge Win a ticket to a QCon event
or an InfoQ Dev SummitFind out more

Introduction: the problem

Reflection, like it or not, it is inevitable that eventually you will have to write some code that involves looking at the members of a type at runtime (rather than at compile time). Maybe you are trying to write utility validation / serialization / ORM code, or maybe the interesting property / method is specified at runtime in a configuration file or from the database. Whatever the cause, it is likely that at some point you’ve written code involving GetType() – something like:

Type type = obj.GetType(); 
foreach (var property in type.GetProperties())
 
{ 

    Console.WriteLine("{0} = {1}", 

       property.Name, 

       property.GetValue(obj, null)); 

}

While this trivial example may work, it isn’t ideal – there are a number of fundamental problems:

It is relatively slow – which is fine if you use it sparingly, but is very noticeable when used in a tight loop
There is no convenient way of bundling up a set of prepared operations into reusable code

This is only exacerbated when the dynamic code gets more complex.

Of course, an obvious answer to the above is “don’t use reflection” – i.e. manually (or perhaps via code-generation tools) write a method for each type that requires it that does what we need. This naive approach is fine on the surface, but leads to massive duplication, and still doesn’t help us handle types that we don’t know about at compile time.

What we need is some form of meta-programming API. It is fortunate, then, that .NET 2.0 ships with Reflection.Emit for writing IL (the equivalent of Java’s “bytecode” in .NET) on the fly, but it forces the developer to handle all the nuances of boxing, type conversions, call-stack details, operator “lifting”, etc – in short, a lot more of the CLI than you really wanted to know about. Another option is CodeDom, but that suffers from its intended use in code-generation, and what we are trying to do quickly gets swallowed by the chore of presenting it to CodeDom.

What we need is a half-way house: that is at a high-enough level that we don’t need to worry about the actual IL, but which isn’t too high: we want our code to be as simple and expressive as possible.

Background: Enter System.Linq.Expressions.Expression

In .NET 3.5, Microsoft introduced LINQ. A key part of LINQ (in particular when talking to external data sources such as databases) is the concept of expression code as expression trees. This is used extensively, for example to indicate how we should filter data:

var query = from cust in customers

            where cust.Region == "North"

            select cust;

It isn’t obvious in the code, but that is identical to the same code written with a lambda expression:

var query = customers.Where(cust => cust.Region == "North");

The key to the power of LINQ is the lambda statement – and the precise details of the Where method. In LINQ-to-Objects, the Where method accepts a Func<T, bool> - i.e. a delegate that can be evaluated for each object (T) to return true (to include the record) or false (to exclude the record). However, with sources such as databases, the Where method accepts an Expression<Func<T,bool>>. Rather than a delegate, this is an expression tree that represents the test to perform.

The key point here is that we can build our own expression trees to represent a wide range of scenarios – and expression trees have the inbuilt ability to be compiled to a typed delegate. This gives us a simple way of writing IL at runtime.

So what is an expression tree?

Unlike regular C# that is compiled to IL, a lambda expression is compiled to code that creates an object model that represents our code. This allows code such as database providers to look at this object model, understand the intent of what we are trying to do, and translate that intent as necessary (for example, writing TSQL).

If we look at the previous lambda expression in isolation:

Expression<Func<Customer, bool>> filter =
    cust => cust.Region == "North";

To see what the compiler does, we need to compile example code (as a lambda expression) as normal, and then look in a tool such as the excellent (and free) Reflector – but first we need to turn off the .NET 3.5 (C# 3.0) optimisations :

Then looking at the output of the disassembler, we can see something like the C# we would need to write it ourselves:

Note, however, that the compiler uses direct MemberInfo access, and illegal variable names – so you should not expect the output to compile directly – it is just for reference. Actually, it is a curiosity that the C# language specification doesn’t actually indicate how the compiler will translate your code to an expression tree, so using the compiler as a reference implementation is one of the few ways of investigating Expression.

For the purposes of writing this manually as an expression tree, it helps to think of things like equality tests (==) and member-access (.) as regular methods that accept operands:

Expression<Func<Customer, bool>> filter =
    cust => Equal(Property(cust,"Region"),"North");

We can now build an equivalent expression tree that accepts a single parameter of type Customer, and returns a boolean by comparing the Region property of this parameter against the string constant “North”:

// declare a parameter of type Customer named cust

ParameterExpression cust = Expression.Parameter(

    typeof(Customer), "cust");

// compare (equality) the Region property of the

// parameter against the string constant "North"

BinaryExpression body = Expression.Equal(

    Expression.Property(cust, "Region"),

    Expression.Constant("North", typeof(string)));

// formalise this as a lambda

Expression<Func<Customer, bool>> filter =

    Expression.Lambda<Func<Customer, bool>>(body, cust);

The final “lambda” line simply formalises the expression into a fully standalone unit, with a defined set of parameters (in this case one) that can be used inside the expression tree. If we attempt to use a parameter inside a lambda that isn’t defined, an exception will be thrown.

The expression tree is simply an immutable object-model that represents our intent:

filter (lambda)
- parameter: Customer: “cust”
- body (binary)
  - method: equals
  - left (member)
    - member: “Region”
    - expression: “cust” parameter
  - right (constant)
    - string: “North”

Once we have a complete expression tree, we can either use that “as is” with LINQ providers (who will then tease apart what we mean from the tree), or we can compile it to a delegate – generating the necessary IL on the fly:

Func<Customer, bool> filterFunc = filter.Compile();

We can now use our delegate by passing in a Customer instance, returning a bool.

This might look a little verbose, but it is a lot simpler than the code we would require to represent the same via Reflection.Emit, especially given complications like boxing.

IMPORTANT: compiling an expression tree to a lambda involves dynamic code generation. To get the optimum performance, you would compile this once – and store it (for example in a field) to be re-used many times.

Doing interesting things with Expression

So far, we’ve only seen a simple example of something that we could already do in regular C# - which isn’t very impressive. So let’s look at something we can’t do in C#... I regularly see questions from people trying to find the right syntax to write a generic “add” (etc) method to perform arithmetic on arbitrary types. Simply: there is no such syntax. However, we can achieve this with Expression.

For brevity, I’ll use “var” rather than explicit variable types, but we’ll illustrate how we can easily cache the delegate instance for re-use:

public static class Operator<T>
{
    private static readonly Func<T, T, T> add;
    public static T Add(T x, T y)
    {
        return add(x, y);
    }
    static Operator()
    {
        var x = Expression.Parameter(typeof(T), "x");
        var y = Expression.Parameter(typeof(T), "y");
        var body = Expression.Add(x, y);
        add = Expression.Lambda<Func<T, T, T>>(
            body, x, y).Compile();
    }
}

Because we compile the delegate in the static constructor, it only happens once per type T, and is then re-used – so in any calling code we can now use Operator<T>. Interestingly, the simple “Add” method hides a great deal of complexity: value- vs reference-types, primitive operators (with raw IL counterparts), bespoke operators (methods added as part of a type definition) and lifted operators (operators provided automatically against Nullable<T>).

This works even for user-defined types - as long as you have defined the + operator; if no suitable operator can be found then it will throw an exception at runtime. In reality this is rarely an issue, as you are unlikely to try using Add unless you believe the type has a sensible operator.

A full implementation of generic operators is freely provided in the MiscUtil library.

What does Expression support?

In .NET 3.5, Expression supports a wide range of the operations required to either query data or to create new objects:

Arithmetic: Add, AddChecked, Divide, Modulo, Multiply, MultiplyChecked, Negate, NegateChecked, Power, Subtract, SubtractChecked, UnaryPlus
Creation: Bind, ElementInit, ListBind, ListInit, MemberBind, MemberInit, New, NewArrayBounds, NewArrayInit
Bitwise: And, ExclusiveOr, LeftShift (<<), Not, Or, RightShift (>>)
Logical: AndAlso (&&), Condition (? :), Equal, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, NotEqual, OrElse (||), TypeIs
Member Access: ArrayIndex, ArrayLength, Call, Field, Property, PropertyOrField
Other: Convert, ConvertChecked, Coalesce (??), Constant, Invoke, Lambda, Parameter, TypeAs, Quote

For example, we could implement a shallow clone looking at the public members of a type – essentially writing (on-the-fly) an implementation like:

person => new Person 

{  

    Name = person.Name,  

    DateOfBirth = person.DateOfBirth, ...  

};

To do this, we need the MemberInit method:

private static readonly Func<T, T> shallowClone; 

static Operator() 

{ 

     var orig = Expression.Parameter(typeof(T), "orig");
 

     // for each read/write property on T, create a 


     // new binding (for the object initializer) that 

     // copies the original's value into the new object 

     var setProps = from prop in typeof(T).GetProperties (
 

                         BindingFlags.Public | BindingFlags.Instance) 

                    where prop.CanRead && prop.CanWrite 

                    select (MemberBinding) Expression.Bind( 

                         prop, Expression.Property(orig, prop)); 


     var body = Expression.MemberInit( // object initializer 

         Expression.New(typeof(T)), // ctor 

         setProps // property assignments 

     ); 


     shallowClone = Expression.Lambda<Func<T, T>>( 

         body, orig).Compile(); 

}

This will out-perform standard reflection (on an object-by-object basis) by a huge margin, without the need to maintain a separate clone method per type and the risk of forgetting to add new properties to the clone.

A similar approach might be used, for example, to map data between DTO entities and entity objects, or between DataTable and class representations of objects – or perhaps to compare two objects for property equality – writing something like:

(x,y) => x.Name == y.Name &&
 
   x.DateOfBirth == y.DateOfBirth && ...;

To do this, we need to use AndAlso to combine each operation:

private static readonly Func<T, T, bool> propertiesEqual;
static Operator()
{
   var x = Expression.Parameter(typeof(T), "x"); 

   var y = Expression.Parameter(typeof(T), "y"); 


   var readableProps =
 
       from prop in typeof(T).GetProperties( 

           BindingFlags.Public | BindingFlags.Instance) 

       where prop.CanRead 

       select prop; 


    Expression combination = null; 

    foreach (var prop in readableProps) 

    { 

       var thisPropEqual = Expression.Equal( 

           Expression.Property(x, prop), 

           Expression.Property(y, prop)); 


       if (combination == null) 

       { // first 

          combination = thisPropEqual; 

       } 

       else 

       { // combine via && 

            combination = Expression.AndAlso( 
                   combination, thisPropEqual); 

       } 

   } 


   if (combination == null) 
   { // nothing to test; return true 

       propertiesEqual = delegate { return true; }; 
   } 

   else 
   { 
       propertiesEqual = Expression.Lambda<Func<T, T, bool>>(
              combination, x, y).Compile();
   } 

}

Limitations of Expression – and the future

So far, Expression has seemed pretty versatile – however, there are some significant limitations that relate to Expression’s LINQ origins:

There is no inbuilt mechanism for changing object properties / fields
There is no way of performing a series of arbitrary operations

In the above example, we got away with performing multiple steps because we could chain all of our AndAlso into a single logical statement. However, there is simply no way (in .NET 3.5) of expressing something like the following in an expression tree (for use as a compiler):

person.DateOfBirth = newDob; 

person.Name = newName; 

person.UpdateFriends(); 

person.Save();

The Bind operation we saw earlier only works when creating new objects. We could obtain the setter method, but we also have no way to chain method calls together (except as part of a “fluent” API, which this isn’t).

Fortunately, .NET 4.0 extends the Expression API with additional types and methods, in order to support the Dynamic Language Runtime (DLR). This covers a much broader spectrum of typical code:

Mutation: AddAssign, AddAssignChecked, AndAssign, Assign, DivideAssign, ExclusiveOrAssign, LeftShiftAssign, ModuloAssign, MultiplyAssign, MultiplyAssignChecked, OrAssign, PostDecrementAssign, PostIncrementAssign, PowerAssign, PreDecrementAssign, PreIncrementAssign, RightShiftAssign, SubtractAssign, SubtractAssignChecked
Arithmetic: Decrement, Default, Increment, OnesComplement
Member Access: ArrayAccess, Dynamic
Logical: ReferenceEqual, ReferenceNotEqual, TypeEqual
Flow: Block, Break, Continue, Empty, Goto, IfThen, IfThenElse, IfFalse, IfTrue, Label, Loop, Return, Switch, SwitchCase, Unbox, Variable
Exceptions: Catch, Rethrow, Throw
Debug: ClearDebugInfo, DebugInfo

This allows much greater use when writing dynamic code (indeed, it even includes the facility to compile into a MethodBuilder for use in runtime type generation). For example, to write a simple “for” loop that prints the numbers 0 thru 9:

            var exitFor = Expression.Label("exitFor"); // jump label

            var x = Expression.Variable(typeof(int), "x");

            var body = Expression.Block(new[] { x }, // declare scope variables

                    Expression.Assign(x,

                        Expression.Constant(0, typeof(int))), // init

                    Expression.Loop(

                        Expression.IfThenElse(

                            Expression.GreaterThanOrEqual( // test for exit

                                x,

                                Expression.Constant(10, typeof(int))

                            ),

                            Expression.Break(exitFor), // perform exit

                            Expression.Block( // perform code

                                Expression.Call(

                                    typeof(Console), "WriteLine", null, x),

                                Expression.PostIncrementAssign(x)

                            )

                        ), exitFor));



            var runtimeLoop = Expression.Lambda<Action>(body).Compile();

Which might not seem hugely impressive – but for writing compiled code at runtime, this is a lot more convenient and flexible than the alternatives.

Earlier we looked at how to clone an object using properties – but we now have the ability to push data from one object into another – very useful with ORM tools where we might need to obtain a change-tracked object, update it with external changes (from either a DTO or a detached instance), and submit the changes. We can do this as follows:

            var source = Expression.Parameter(typeof(T), "source");

            var dest = Expression.Parameter(typeof(T), "dest");



            // for each read/write property, copy the source's value

            // into the destination object

            var body = Expression.Block(

                from prop in typeof(T).GetProperties(

                    BindingFlags.Public | BindingFlags.Instance)

                where prop.CanRead && prop.CanWrite

                select Expression.Assign(

                    Expression.Property(dest, prop),

                    Expression.Property(source, prop)));



            var copyMethod = Expression.Lambda<Action<T,T>>(

                body, source, dest).Compile();

Expressions with runtime Type creation

Another interesting feature in .NET 4.0 is that in addition to compiling an Expression to a standalone delegate, you can now use an expression tree to provide the body of a MethodBuilder (used when creating new Types at runtime), via the new CompileToMethod. This hints at a future where we can write rich types at runtime (including polymorphism and interface implementation) without ever having to touch raw IL.

Advanced Expressions in the language

Although the runtime now supports much more sophisticated expression trees, there is no additional language support for this in C# 4.0 – so the only way to create this type of expression is manually. This isn’t actually an issue, as no LINQ provider will support these expressions, and if you know the code at compile-time, you’d just write a regular method / anonymous method. It does leave some interesting possibilities for post-4.0, though. For example (and this is pure speculation), imagine if the language could use Expression.Assign and Expression.Block to build an expression tree for use with a database, something like:

        // imaginary code; this doesn’t work

        myDatabase.Customers

             .Where(c => c.Region == "North")

             .Update(c => {

                 c.Manager = "Fred";

                 c.Priority = c.Priority + 10;

             });

I stress that this is just “what if?” – it would require a huge amount of work to implement by the LINQ providers, but as we’ve shown above: in theory the Expression object is capable of encapsulating our intent (to update multiple properties of an entity). Major language and framework changes don’t come easily though; only time will tell.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Expression as a Compiler

InfoQ Article Contest

Introduction: the problem

Related Sponsored Content

Background: Enter System.Linq.Expressions.Expression

So what is an expression tree?

Doing interesting things with Expression

What does Expression support?

Limitations of Expression – and the future

Expressions with runtime Type creation

Advanced Expressions in the language

Rate this Article

This content is in the Microsoft topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter