BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Aaron Erickson on LINQ and i4o

Aaron Erickson on LINQ and i4o

Bookmarks

Aaron Erickson of Magenic has been working on an extension for LINQ called Indexes for Objects or i4o. In a nutshell, this allows in-memory collections to be indexed in a way that is supported by LINQ. The project is open source and can be downloaded from CodePlex.

Jonathan Allen (JA): Why did you originally start working on i4o?

Aaron Erickson (AE): Well - it really happened by accident. I was working on a project for a client that involved working with large, in memory collections - pricing bundle optimization for a large buyer of pharmaceutical products. Now, of course, this is in .NET 2.0, so we have no LINQ - so I was finding myself implementing numerous custom methods to find particular items in these collections. In the course of implementing these "find" methods, it was not too much of a stretch to realize that indexes on various collections would dramatically improve performance. So I manually implemented indexes on certain fields of my collections in order to make our bundle price analysis computations run in a reasonable amount of time.

At around the same time I was working on that project, I was preparing for a demonstration of LINQ I was giving to the Chicago .NET Users Group (CNUG). It occurred to me that, just like databases benefit from indexes, so would large in-memory collections. I started of thinking of every application I have ever been called in as a consultant to optimize... and sure enough, 90% of the improvement often came from simply finding the best indexes for frequently run queries. It was at that moment when the idea that became i4o was born - the idea that we need a way to easily make indexed collections that "just work" with LINQ.

JA: In your blog you mentioned expression trees. How difficult is it to work with them? Do you see your average programmer touching them from time to time, or is it targeted more towards API and library designers?

AE: Personally, I have a (small) amount of LISP background - which made the idea of expression trees not too much of a stretch. The hardest part of working with them was just the trial and error of figuring out how the API worked. I would imagine that working with them using a dynamic language like Ruby or Lisp is easier than C#, where you spend a lot of code casting nodes between different node types, but overall, once you have your head around the idea of "code that writes code", it isn't too bad.

As far as the average developer, I would guess that the same developers that use reflection frequently now might play with expression trees once Orcas is released. This is both good and bad - I have seen a lot of "reflection overkill" (using reflection because it is cool, not because it is useful), and for the same reasons, you might see overuse of expressions. That said, there are some real good uses for expressions, especially in cases where you are implementing a dynamic strategy pattern (i.e. a strategy pattern that adjusts the strategy at runtime).

JA: Can you give a brief example of working with expression trees?

AE: Sure - let's start with something simple - build an expression that does something, determined by the user, with two numbers together and returns the result:

using System;
using System.Linq;
using System.Linq.Expressions;

namespace ExpressionDemo
{
class Program
{
static void Main(string[] args)
{
//create two parameter objects
ParameterExpression xParam = Expression.Parameter(typeof(int), "x");
ParameterExpression yParam = Expression.Parameter(typeof(int), "y");
//create an array from the objects, used when creating the lambda at runtime
ParameterExpression[] myParams = { xParam, yParam };
//create an add operation at runtime
BinaryExpression operation = Expression.Add(xParam,yParam);
//do a simple menu that allows the user to determine what code we write
Console.WriteLine("We are going to do something to 69 and 42...");
Console.WriteLine("Press m to multiply");
Console.WriteLine("Press a to add");
Console.WriteLine("Press s to subtract");
Console.WriteLine("Press d to divide");

ConsoleKeyInfo keyInfo = Console.ReadKey();
//based on user input, create the body of the expression
switch (keyInfo.KeyChar)
{
case 'm':
operation = Expression.Multiply(xParam, yParam);
break;
case 'a':
operation = Expression.Add(xParam, yParam);
break;
case 's':
operation = Expression.Subtract(xParam, yParam);
break;
case 'd':
operation = Expression.Divide(xParam, yParam);
break;
}
//we now have everything we need to create a runtime lambda expression
// let's do so
LambdaExpression simpleOpLambda = Expression.Lambda<Func<int, int, int>>(
operation,
myParams);
//ahh - this is where the magic is
// Compile on lambda methods creates a dynamic method
// at runtime. Whats cool about this is we build a data structure
// based on user input that, through Compile() turns into code at run time
int result = (int)simpleOpLambda.Compile().DynamicInvoke(69, 42);
//Show the result to the user.
Console.WriteLine("");
Console.WriteLine("Result = " + result);
Console.WriteLine("Press the 'any' key to exit...");
Console.ReadKey();

}
}
}

Of course, this is a trivial example that does not do anything particularly useful, but what is important about this is that we built a method at runtime without having to Emit IL to do so. Writing code that writes code is what is new and different about this.

In i4o, we don't modify expression trees, but we certainly look at them. When you write the where clause in LINQ, you are writing an expression that can be analyzed. All i4o has to do is look to see that a.) it is an equality expression b.) the left side references a property we have an index for and c.) the right side is either a constant, or can be evaluated into something we can look up in an index. Once those conditions are met, we take what we need from the expression, and use that information to perform our own lookup logic.

JA: Do you see the Dynamic Language Runtime fitting into your future plans for i4o?

AE: First off, of course, i4o obviously needs to be able to extend to DLR based languages. The question is just how that will be done. As a nice stroke of luck, Jim Hugunin has recently posted that the DLR expression trees will be seen as statically typed and bound trees at runtime (http://blogs.msdn.com/hugunin/archive/2007/05/15/dlr-trees-part-1.aspx). The really tricky part of i4o, evaluation of the expression tree at runtime to determine whether you can use an index, will continue to be not only possible, but should work in its current implementation.

The next issue in making i4o support the DLR is support of attributes in DLR based languages. Last time I checked, support for CLR style custom attributes in some of the various languages is still being worked on. That said, it is not a big leap to assume that Microsoft will come up with some form of declarative decorators (attributes) across the languages. In the event that does not occur, it would not be hard to add some utility in IndexableCollection to support a means of index declaration that does not depend on the decorators. In fact, it is likely that i4o will see such a mechanism soon, so that optimizations can be done that allow consumers of i4o to determine at runtime which fields should be indexed (i.e. CreateIndex(PropertyInfo propertyToAdd), RemoveIndex(PropertyInfo propertyToRemove ), etc.).

JA: Do you see Microsoft's plans to have C# anonymous classes immutable and VB anonymous class's key support affecting your design for i4o?

AE: I find it unlikely that it will affect the design of i4o much more than the current problem of handling property changes on indexed fields. If a property changes during the course of a query, thereby changing the index, then if you have property change events, your index would update and the query would proceed correctly. Of course, and this is my opinion, if you are doing a query where you are searching for a field that changes during the query - I am very curious as to why. The performance in i4o would probably be less than optimal in that event, since every record you touch and change produces an index update. Regardless, I will probably follow the lead of Microsoft in terms of how they handle changes to their implementation of where, if that happens as a result of this decision on C# and mutability of anonymous types.

I do find it very interesting that the VB team came up with the declared Key concept. It is likely that i4o will recognize said key fields and index those automatically.

In addition to i4o, Aaron Erickson is working on MetaLinq, a library for manipulating expression trees.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Yet another person re-invents pattern matching

    by peter lin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It's interesting to see people re-invent pattern matching, rather than simply read the mountain of literature in this domain. Creating in-memory indexes of data is old stuff. One just needs to understand relational theory, pattern matching and discrimination networks to build efficient and scalable indexing engine for in-memory data. There's no need to re-invent stuff when it's been around for over 2 decades.

  • Re: Yet another person re-invents pattern matching

    by Jonathan Allen,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I have to say your comment doesn't make much sense to me. We ran the story because Aaron was the first person we heard about that applied the technique in a way that worked with LINQ to Objects. No one claimed he (re)invented indexing or pattern matching.

  • Re: Yet another person re-invents pattern matching

    by peter lin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Sure, it's new to LINQ and .NET, but it's old stuff. Prolog and smalltalk has been doing this kind of stuff for over a decade. My criticism is a bit harsh, since I don't see the benefit of linq. it is different and useful, but not better than other approaches. If developers are writing business logic using linq, that could very easily lead to a pile of junk. my bias 2 cents.

  • Re: Yet another person re-invents pattern matching

    by Aaron Erickson,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Yes, Peter, you are right :)

    It is a really simple technique. Pattern matching it isn't, but basic hashtable theory it is.

    Often, it isn't how clever your theory is (mine isn't terribly clever), but it is simply the act of making sure it is available at a time when others might need it. Frankly. I waited several months after I thought of the idea - thinking "sure, MSFT has thought of this... why bother to write it". Only after I realized LINQ was in beta and not likely to change, did I bother to go ahead and code it, mostly because nobody else was bothering.

    The benefit isn't the hashtable stuff, directly, but it is making it easy to use for the millions of C# developers, rather than the small number of prolog and smalltalk developers that still exist.

    Frankly, I hope Microsoft implements something like this in a future version of the framework. I have no intent of making a living off i4o (that is what metalinq is for - lol - j/k). But rather, I don't want to see a bunch of people do LINQ for objects and end up writing crap code that searches through millions of objects without using indexes, and thereby making people write articles about how slow LINQ to objects is next year.

  • Re: Yet another person re-invents pattern matching

    by peter lin,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    my apologies if i was too harsh. I hope microsoft hooks MS BRE with LINQ to provide an efficient indexing engine. All they have to do is implement support for queries in MS BRE and it can easily provide high performance indexing. who knows, maybe MS is already considering that idea.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT