BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Expression as a Compiler

Posted by Marc Gravell on Aug 10, 2009 |

Introduction: the problem

Reflection, like it or not, it is inevitable that eventually you will have to write some code that involves looking at the members of a type at runtime (rather than at compile time). Maybe you are trying to write utility validation / serialization / ORM code, or maybe the interesting property / method is specified at runtime in a configuration file or from the database. Whatever the cause, it is likely that at some point you’ve written code involving GetType() – something like:
Type type = obj.GetType(); 
foreach (var property in type.GetProperties())
 
{ 

    Console.WriteLine("{0} = {1}", 

       property.Name, 

       property.GetValue(obj, null)); 

} 
While this trivial example may work, it isn’t ideal – there are a number of fundamental problems:
  • It is relatively slow – which is fine if you use it sparingly, but is very noticeable when used in a tight loop
  • There is no convenient way of bundling up a set of prepared operations into reusable code

This is only exacerbated when the dynamic code gets more complex.

Of course, an obvious answer to the above is “don’t use reflection” – i.e. manually (or perhaps via code-generation tools) write a method for each type that requires it that does what we need. This naive approach is fine on the surface, but leads to massive duplication, and still doesn’t help us handle types that we don’t know about at compile time.

What we need is some form of meta-programming API. It is fortunate, then, that .NET 2.0 ships with Reflection.Emit for writing IL (the equivalent of Java’s “bytecode” in .NET) on the fly, but it forces the developer to handle all the nuances of boxing, type conversions, call-stack details, operator “lifting”, etc – in short, a lot more of the CLI than you really wanted to know about. Another option is CodeDom, but that suffers from its intended use in code-generation, and what we are trying to do quickly gets swallowed by the chore of presenting it to CodeDom.

What we need is a half-way house: that is at a high-enough level that we don’t need to worry about the actual IL, but which isn’t too high: we want our code to be as simple and expressive as possible.

Background: Enter System.Linq.Expressions.Expression

In .NET 3.5, Microsoft introduced LINQ. A key part of LINQ (in particular when talking to external data sources such as databases) is the concept of expression code as expression trees. This is used extensively, for example to indicate how we should filter data:

var query = from cust in customers

            where cust.Region == "North"

            select cust;

It isn’t obvious in the code, but that is identical to the same code written with a lambda expression:

var query = customers.Where(cust => cust.Region == "North");

The key to the power of LINQ is the lambda statement – and the precise details of the Where method. In LINQ-to-Objects, the Where method accepts a Func<T, bool> - i.e. a delegate that can be evaluated for each object (T) to return true (to include the record) or false (to exclude the record). However, with sources such as databases, the Where method accepts an Expression<Func<T,bool>>. Rather than a delegate, this is an expression tree that represents the test to perform.

The key point here is that we can build our own expression trees to represent a wide range of scenarios – and expression trees have the inbuilt ability to be compiled to a typed delegate. This gives us a simple way of writing IL at runtime.

So what is an expression tree?

Unlike regular C# that is compiled to IL, a lambda expression is compiled to code that creates an object model that represents our code. This allows code such as database providers to look at this object model, understand the intent of what we are trying to do, and translate that intent as necessary (for example, writing TSQL).

If we look at the previous lambda expression in isolation:

Expression<Func<Customer, bool>> filter =
    cust => cust.Region == "North";

To see what the compiler does, we need to compile example code (as a lambda expression) as normal, and then look in a tool such as the excellent (and free) Reflector – but first we need to turn off the .NET 3.5 (C# 3.0) optimisations :

Then looking at the output of the disassembler, we can see something like the C# we would need to write it ourselves:

Note, however, that the compiler uses direct MemberInfo access, and illegal variable names – so you should not expect the output to compile directly – it is just for reference. Actually, it is a curiosity that the C# language specification doesn’t actually indicate how the compiler will translate your code to an expression tree, so using the compiler as a reference implementation is one of the few ways of investigating Expression.

For the purposes of writing this manually as an expression tree, it helps to think of things like equality tests (==) and member-access (.) as regular methods that accept operands:

Expression<Func<Customer, bool>> filter =
    cust => Equal(Property(cust,"Region"),"North"); 

We can now build an equivalent expression tree that accepts a single parameter of type Customer, and returns a boolean by comparing the Region property of this parameter against the string constant “North”:

// declare a parameter of type Customer named cust

ParameterExpression cust = Expression.Parameter(

    typeof(Customer), "cust");

// compare (equality) the Region property of the

// parameter against the string constant "North"
BinaryExpression body = Expression.Equal( Expression.Property(cust, "Region"), Expression.Constant("North", typeof(string))); // formalise this as a lambda
Expression<Func<Customer, bool>> filter = Expression.Lambda<Func<Customer, bool>>(body, cust);

The final “lambda” line simply formalises the expression into a fully standalone unit, with a defined set of parameters (in this case one) that can be used inside the expression tree. If we attempt to use a parameter inside a lambda that isn’t defined, an exception will be thrown.

The expression tree is simply an immutable object-model that represents our intent:

  • filter (lambda)
    • parameter: Customer: “cust”
    • body (binary)
      • method: equals
      • left (member)
        • member: “Region”
        • expression: “cust” parameter
      • right (constant)
        • string: “North”

Once we have a complete expression tree, we can either use that “as is” with LINQ providers (who will then tease apart what we mean from the tree), or we can compile it to a delegate – generating the necessary IL on the fly:

Func<Customer, bool> filterFunc = filter.Compile(); 

We can now use our delegate by passing in a Customer instance, returning a bool.

This might look a little verbose, but it is a lot simpler than the code we would require to represent the same via Reflection.Emit, especially given complications like boxing.

IMPORTANT: compiling an expression tree to a lambda involves dynamic code generation. To get the optimum performance, you would compile this once – and store it (for example in a field) to be re-used many times.

Doing interesting things with Expression

So far, we’ve only seen a simple example of something that we could already do in regular C# - which isn’t very impressive. So let’s look at something we can’t do in C#... I regularly see questions from people trying to find the right syntax to write a generic “add” (etc) method to perform arithmetic on arbitrary types. Simply: there is no such syntax. However, we can achieve this with Expression.

For brevity, I’ll use “var” rather than explicit variable types, but we’ll illustrate how we can easily cache the delegate instance for re-use:

public static class Operator<T>
{
    private static readonly Func<T, T, T> add;
    public static T Add(T x, T y)
    {
        return add(x, y);
    }
    static Operator()
    {
        var x = Expression.Parameter(typeof(T), "x");
        var y = Expression.Parameter(typeof(T), "y");
        var body = Expression.Add(x, y);
        add = Expression.Lambda<Func<T, T, T>>(
            body, x, y).Compile();
    }
}

Because we compile the delegate in the static constructor, it only happens once per type T, and is then re-used – so in any calling code we can now use Operator<T>. Interestingly, the simple “Add” method hides a great deal of complexity: value- vs reference-types, primitive operators (with raw IL counterparts), bespoke operators (methods added as part of a type definition) and lifted operators (operators provided automatically against Nullable<T>).

This works even for user-defined types - as long as you have defined the + operator; if no suitable operator can be found then it will throw an exception at runtime. In reality this is rarely an issue, as you are unlikely to try using Add unless you believe the type has a sensible operator.

A full implementation of generic operators is freely provided in the MiscUtil library.

What does Expression support?

In .NET 3.5, Expression supports a wide range of the operations required to either query data or to create new objects:

  • Arithmetic: Add, AddChecked, Divide, Modulo, Multiply, MultiplyChecked, Negate, NegateChecked, Power, Subtract, SubtractChecked, UnaryPlus
  • Creation: Bind, ElementInit, ListBind, ListInit, MemberBind, MemberInit, New, NewArrayBounds, NewArrayInit
  • Bitwise: And, ExclusiveOr, LeftShift (<<), Not, Or, RightShift (>>)
  • Logical: AndAlso (&&), Condition (? :), Equal, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual, NotEqual, OrElse (||), TypeIs
  • Member Access: ArrayIndex, ArrayLength, Call, Field, Property, PropertyOrField
  • Other: Convert, ConvertChecked, Coalesce (??), Constant, Invoke, Lambda, Parameter, TypeAs, Quote

For example, we could implement a shallow clone looking at the public members of a type – essentially writing (on-the-fly) an implementation like:

person => new Person 

{  

    Name = person.Name,  

    DateOfBirth = person.DateOfBirth, ...  

}; 

To do this, we need the MemberInit method:

private static readonly Func<T, T> shallowClone; 

static Operator() 

{ 

     var orig = Expression.Parameter(typeof(T), "orig");
 

     // for each read/write property on T, create a 


// new binding (for the object initializer) that

// copies the original's value into the new object
var setProps = from prop in typeof(T).GetProperties ( BindingFlags.Public | BindingFlags.Instance) where prop.CanRead && prop.CanWrite select (MemberBinding) Expression.Bind( prop, Expression.Property(orig, prop)); var body = Expression.MemberInit( // object initializer Expression.New(typeof(T)), // ctor
setProps // property assignments ); shallowClone = Expression.Lambda<Func<T, T>>( body, orig).Compile(); }

This will out-perform standard reflection (on an object-by-object basis) by a huge margin, without the need to maintain a separate clone method per type and the risk of forgetting to add new properties to the clone.

A similar approach might be used, for example, to map data between DTO entities and entity objects, or between DataTable and class representations of objects – or perhaps to compare two objects for property equality – writing something like:

(x,y) => x.Name == y.Name &&
 
   x.DateOfBirth == y.DateOfBirth && ...; 

To do this, we need to use AndAlso to combine each operation:

private static readonly Func<T, T, bool> propertiesEqual;
static Operator()
{
   var x = Expression.Parameter(typeof(T), "x"); 

   var y = Expression.Parameter(typeof(T), "y"); 


   var readableProps =
 
       from prop in typeof(T).GetProperties( 

           BindingFlags.Public | BindingFlags.Instance) 

       where prop.CanRead 

       select prop; 


    Expression combination = null; 

    foreach (var prop in readableProps) 

    { 

       var thisPropEqual = Expression.Equal( 

           Expression.Property(x, prop), 

           Expression.Property(y, prop)); 


       if (combination == null) 

       { // first 

          combination = thisPropEqual; 

       } 

       else 

       { // combine via && 

            combination = Expression.AndAlso( 
                   combination, thisPropEqual); 

       } 

   } 


   if (combination == null) 
   { // nothing to test; return true 

       propertiesEqual = delegate { return true; }; 
   } 

   else 
   { 
       propertiesEqual = Expression.Lambda<Func<T, T, bool>>(
              combination, x, y).Compile();
   } 

} 

Limitations of Expression – and the future

So far, Expression has seemed pretty versatile – however, there are some significant limitations that relate to Expression’s LINQ origins:

  • There is no inbuilt mechanism for changing object properties / fields
  • There is no way of performing a series of arbitrary operations

In the above example, we got away with performing multiple steps because we could chain all of our AndAlso into a single logical statement. However, there is simply no way (in .NET 3.5) of expressing something like the following in an expression tree (for use as a compiler):

person.DateOfBirth = newDob; 

person.Name = newName; 

person.UpdateFriends(); 

person.Save(); 

The Bind operation we saw earlier only works when creating new objects. We could obtain the setter method, but we also have no way to chain method calls together (except as part of a “fluent” API, which this isn’t).

Fortunately, .NET 4.0 extends the Expression API with additional types and methods, in order to support the Dynamic Language Runtime (DLR). This covers a much broader spectrum of typical code:

  • Mutation: AddAssign, AddAssignChecked, AndAssign, Assign, DivideAssign, ExclusiveOrAssign, LeftShiftAssign, ModuloAssign, MultiplyAssign, MultiplyAssignChecked, OrAssign, PostDecrementAssign, PostIncrementAssign, PowerAssign, PreDecrementAssign, PreIncrementAssign, RightShiftAssign, SubtractAssign, SubtractAssignChecked
  • Arithmetic: Decrement, Default, Increment, OnesComplement
  • Member Access: ArrayAccess, Dynamic
  • Logical: ReferenceEqual, ReferenceNotEqual, TypeEqual
  • Flow: Block, Break, Continue, Empty, Goto, IfThen, IfThenElse, IfFalse, IfTrue, Label, Loop, Return, Switch, SwitchCase, Unbox, Variable
  • Exceptions: Catch, Rethrow, Throw
  • Debug: ClearDebugInfo, DebugInfo

This allows much greater use when writing dynamic code (indeed, it even includes the facility to compile into a MethodBuilder for use in runtime type generation). For example, to write a simple “for” loop that prints the numbers 0 thru 9:

            var exitFor = Expression.Label("exitFor"); // jump label
var x = Expression.Variable(typeof(int), "x"); var body = Expression.Block(new[] { x }, // declare scope variables
Expression.Assign(x, Expression.Constant(0, typeof(int))), // init
Expression.Loop( Expression.IfThenElse( Expression.GreaterThanOrEqual( // test for exit x, Expression.Constant(10, typeof(int)) ), Expression.Break(exitFor), // perform exit
Expression.Block( // perform code Expression.Call( typeof(Console), "WriteLine", null, x), Expression.PostIncrementAssign(x) ) ), exitFor)); var runtimeLoop = Expression.Lambda<Action>(body).Compile();

Which might not seem hugely impressive – but for writing compiled code at runtime, this is a lot more convenient and flexible than the alternatives.

Earlier we looked at how to clone an object using properties – but we now have the ability to push data from one object into another – very useful with ORM tools where we might need to obtain a change-tracked object, update it with external changes (from either a DTO or a detached instance), and submit the changes. We can do this as follows:

            var source = Expression.Parameter(typeof(T), "source");

            var dest = Expression.Parameter(typeof(T), "dest");



            // for each read/write property, copy the source's value

// into the destination object
var body = Expression.Block( from prop in typeof(T).GetProperties( BindingFlags.Public | BindingFlags.Instance) where prop.CanRead && prop.CanWrite select Expression.Assign( Expression.Property(dest, prop), Expression.Property(source, prop))); var copyMethod = Expression.Lambda<Action<T,T>>( body, source, dest).Compile();

Expressions with runtime Type creation

Another interesting feature in .NET 4.0 is that in addition to compiling an Expression to a standalone delegate, you can now use an expression tree to provide the body of a MethodBuilder (used when creating new Types at runtime), via the new CompileToMethod. This hints at a future where we can write rich types at runtime (including polymorphism and interface implementation) without ever having to touch raw IL.

Advanced Expressions in the language

Although the runtime now supports much more sophisticated expression trees, there is no additional language support for this in C# 4.0 – so the only way to create this type of expression is manually. This isn’t actually an issue, as no LINQ provider will support these expressions, and if you know the code at compile-time, you’d just write a regular method / anonymous method. It does leave some interesting possibilities for post-4.0, though. For example (and this is pure speculation), imagine if the language could use Expression.Assign and Expression.Block to build an expression tree for use with a database, something like:

        // imaginary code; this doesn’t work

        myDatabase.Customers

             .Where(c => c.Region == "North")

             .Update(c => {

                 c.Manager = "Fred";

                 c.Priority = c.Priority + 10;

             });

I stress that this is just “what if?” – it would require a huge amount of work to implement by the LINQ providers, but as we’ve shown above: in theory the Expression object is capable of encapsulating our intent (to update multiple properties of an entity). Major language and framework changes don’t come easily though; only time will tell.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

CodeDOM by David L. S.

I'm not a expert on CodeDOM, but it seems that Expression is going on the same direction. If it is the case, it would be nice to have a comparative between them.

excellent article, thanks Marc by Bill Woodruff

While I can't honestly say I "understand" the concepts on first reading, I think you've done a great job of enabling me to focus my study in a productive way so that eventually I "will" understand the very interesting features of LINQ you are demonstrating.

best, Bill
dotScience

Re: excellent article, thanks Marc by Marc Gravell

While I can't honestly say I "understand" the concepts on first reading


Indeed, the Expression API isn't trivial; but once it "clicks" (which doesn't take much playing) it really is very powerful. If you have any specific questions, please let me know.

Very nice by Paul Florin

I'm still learning about expression trees and I was wondering if you could help me modify your code slightly so that I am able to only copy the properties that aren't null from a source object ot a destination object:

private static Func<TSource, TTarget> BuildCopier()
{
ParameterExpression sourceParameter = Expression.Parameter(typeof(TSource), "source");
ParameterExpression targetParameter = Expression.Parameter(typeof(TTarget), "destination");
var bindings = new List<MemberBinding>();
foreach (PropertyInfo sourceProperty in typeof(TSource).GetProperties())
{
PropertyInfo targetProperty = typeof(TTarget).GetProperty(sourceProperty.Name);
// Pseudo-code
if (sourceValue != null)
bindings.Add(Expression.Bind(targetProperty, Expression.Property(sourceParameter, sourceProperty)));
}
// ...
return Expression.Lambda<Func><TSource,TTarget>>(initializer, sourceParameter).Compile();
}

Thanks</tsource,ttarget></memberbinding></tsource,>

Re: Very nice by Marc Gravell

Hi Paul - sorry, the e-mail notification failed! In 3.5, that would be a challenge, as it is *essentially* writing an object initializer, i.e.

Foo foo = new Foo { A = old.A, B = old.B };

It would be hard to make that not do the assignments. It could be done in 4.0, though - let me know if you want an example.

System.Security.VerificationException by Tejas Sharma

Hi Marc
Thanks for the post, I found it to be very informational and a great introduction to expressions. I did have a question regarding your property equality checker method. I ran the same with a simple anonymous type and it worked just fine. However, when I ran it against two strings, my application threw a VerificationException with the message "Operation could destabilize the runtime." Would you happen to know why this particular exception was thrown and if there are similar caveats to watch out for when writing expressions which are to be compiled?
Thanks
Tejas

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

6 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT