BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Aaron Erickson on LINQ and i4o

Aaron Erickson on LINQ and i4o

Aaron Erickson of Magenic has been working on an extension for LINQ called Indexes for Objects or i4o. In a nutshell, this allows in-memory collections to be indexed in a way that is supported by LINQ. The project is open source and can be downloaded from CodePlex.

Jonathan Allen (JA): Why did you originally start working on i4o?

Aaron Erickson (AE): Well - it really happened by accident. I was working on a project for a client that involved working with large, in memory collections - pricing bundle optimization for a large buyer of pharmaceutical products. Now, of course, this is in .NET 2.0, so we have no LINQ - so I was finding myself implementing numerous custom methods to find particular items in these collections. In the course of implementing these "find" methods, it was not too much of a stretch to realize that indexes on various collections would dramatically improve performance. So I manually implemented indexes on certain fields of my collections in order to make our bundle price analysis computations run in a reasonable amount of time.

At around the same time I was working on that project, I was preparing for a demonstration of LINQ I was giving to the Chicago .NET Users Group (CNUG). It occurred to me that, just like databases benefit from indexes, so would large in-memory collections. I started of thinking of every application I have ever been called in as a consultant to optimize... and sure enough, 90% of the improvement often came from simply finding the best indexes for frequently run queries. It was at that moment when the idea that became i4o was born - the idea that we need a way to easily make indexed collections that "just work" with LINQ.

JA: In your blog you mentioned expression trees. How difficult is it to work with them? Do you see your average programmer touching them from time to time, or is it targeted more towards API and library designers?

AE: Personally, I have a (small) amount of LISP background - which made the idea of expression trees not too much of a stretch. The hardest part of working with them was just the trial and error of figuring out how the API worked. I would imagine that working with them using a dynamic language like Ruby or Lisp is easier than C#, where you spend a lot of code casting nodes between different node types, but overall, once you have your head around the idea of "code that writes code", it isn't too bad.

As far as the average developer, I would guess that the same developers that use reflection frequently now might play with expression trees once Orcas is released. This is both good and bad - I have seen a lot of "reflection overkill" (using reflection because it is cool, not because it is useful), and for the same reasons, you might see overuse of expressions. That said, there are some real good uses for expressions, especially in cases where you are implementing a dynamic strategy pattern (i.e. a strategy pattern that adjusts the strategy at runtime).

JA: Can you give a brief example of working with expression trees?

AE: Sure - let's start with something simple - build an expression that does something, determined by the user, with two numbers together and returns the result:

using System;
using System.Linq;
using System.Linq.Expressions;

namespace ExpressionDemo
{
class Program
{
static void Main(string[] args)
{
//create two parameter objects
ParameterExpression xParam = Expression.Parameter(typeof(int), "x");
ParameterExpression yParam = Expression.Parameter(typeof(int), "y");
//create an array from the objects, used when creating the lambda at runtime
ParameterExpression[] myParams = { xParam, yParam };
//create an add operation at runtime
BinaryExpression operation = Expression.Add(xParam,yParam);
//do a simple menu that allows the user to determine what code we write
Console.WriteLine("We are going to do something to 69 and 42...");
Console.WriteLine("Press m to multiply");
Console.WriteLine("Press a to add");
Console.WriteLine("Press s to subtract");
Console.WriteLine("Press d to divide");

ConsoleKeyInfo keyInfo = Console.ReadKey();
//based on user input, create the body of the expression
switch (keyInfo.KeyChar)
{
case 'm':
operation = Expression.Multiply(xParam, yParam);
break;
case 'a':
operation = Expression.Add(xParam, yParam);
break;
case 's':
operation = Expression.Subtract(xParam, yParam);
break;
case 'd':
operation = Expression.Divide(xParam, yParam);
break;
}
//we now have everything we need to create a runtime lambda expression
// let's do so
LambdaExpression simpleOpLambda = Expression.Lambda<Func<int, int, int>>(
operation,
myParams);
//ahh - this is where the magic is
// Compile on lambda methods creates a dynamic method
// at runtime. Whats cool about this is we build a data structure
// based on user input that, through Compile() turns into code at run time
int result = (int)simpleOpLambda.Compile().DynamicInvoke(69, 42);
//Show the result to the user.
Console.WriteLine("");
Console.WriteLine("Result = " + result);
Console.WriteLine("Press the 'any' key to exit...");
Console.ReadKey();

}
}
}

Of course, this is a trivial example that does not do anything particularly useful, but what is important about this is that we built a method at runtime without having to Emit IL to do so. Writing code that writes code is what is new and different about this.

In i4o, we don't modify expression trees, but we certainly look at them. When you write the where clause in LINQ, you are writing an expression that can be analyzed. All i4o has to do is look to see that a.) it is an equality expression b.) the left side references a property we have an index for and c.) the right side is either a constant, or can be evaluated into something we can look up in an index. Once those conditions are met, we take what we need from the expression, and use that information to perform our own lookup logic.

JA: Do you see the Dynamic Language Runtime fitting into your future plans for i4o?

AE: First off, of course, i4o obviously needs to be able to extend to DLR based languages. The question is just how that will be done. As a nice stroke of luck, Jim Hugunin has recently posted that the DLR expression trees will be seen as statically typed and bound trees at runtime (http://blogs.msdn.com/hugunin/archive/2007/05/15/dlr-trees-part-1.aspx). The really tricky part of i4o, evaluation of the expression tree at runtime to determine whether you can use an index, will continue to be not only possible, but should work in its current implementation.

The next issue in making i4o support the DLR is support of attributes in DLR based languages. Last time I checked, support for CLR style custom attributes in some of the various languages is still being worked on. That said, it is not a big leap to assume that Microsoft will come up with some form of declarative decorators (attributes) across the languages. In the event that does not occur, it would not be hard to add some utility in IndexableCollection to support a means of index declaration that does not depend on the decorators. In fact, it is likely that i4o will see such a mechanism soon, so that optimizations can be done that allow consumers of i4o to determine at runtime which fields should be indexed (i.e. CreateIndex(PropertyInfo propertyToAdd), RemoveIndex(PropertyInfo propertyToRemove ), etc.).

JA: Do you see Microsoft's plans to have C# anonymous classes immutable and VB anonymous class's key support affecting your design for i4o?

AE: I find it unlikely that it will affect the design of i4o much more than the current problem of handling property changes on indexed fields. If a property changes during the course of a query, thereby changing the index, then if you have property change events, your index would update and the query would proceed correctly. Of course, and this is my opinion, if you are doing a query where you are searching for a field that changes during the query - I am very curious as to why. The performance in i4o would probably be less than optimal in that event, since every record you touch and change produces an index update. Regardless, I will probably follow the lead of Microsoft in terms of how they handle changes to their implementation of where, if that happens as a result of this decision on C# and mutability of anonymous types.

I do find it very interesting that the VB team came up with the declared Key concept. It is likely that i4o will recognize said key fields and index those automatically.

In addition to i4o, Aaron Erickson is working on MetaLinq, a library for manipulating expression trees.

Rate this Article

Adoption
Style

BT