BT

InfoQ Homepage News LINQ Grouping Techniques

LINQ Grouping Techniques

Bookmarks

For the most part LINQ works very much like SQL. Sources, joins, selects, and where clauses are all pretty standard fare. The Group/By/Into clause is where this breaks down. Unlike SQL, which always returns a rectangular dataset, LINQ is capable of returning hierarchical data.

An example of this would be grouping customers by country and city. In SQL this would require either manually grouping on the client-side or performing 1 + N + (N*M) queries: one to get the countries, one for the cities in each country, and one for each customer list.

This can all be done with a single LINQ query by using a series of sub queries. However, the query gets increasing complex. Mitsu demonstrates:

var q = from c in db.Customers group c by c.Country into g select new { g.Key, Count = g.Count(), SubGroups = from c in g group c by c.City into g2 select g2};

Demonstrating the flexibility the LINQ framework, Mitsu reduces it down this single line:

var result = customers.GroupByMany(c => c.Country, c => c.City);

Mitsu did this in a general fashion suitable for any LINQ query. You can see the source code and an explanation of how it works on his blog.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • err...

    by Frans Bouma /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It can be done in 1 query?
    select count(customerid) as numcustomer, country, city
    from customers
    group by country, city
    order by country asc, numcustomer desc

    Then all you need is a reader which reads it in, one row at a time. You have to traverse the list just once.

    Having it projected for you in a hierarchy is great of course, but I find it misleading you'd need 1+N+(N*M) queries, which is simply nonsense.

    The downside of this particular linq construct is that the line between what's executed in the DB and what's executed in memory is blurred. Can you tell which group by is ran in the DB and which one in memory? IMHO it's better to keep things separated, so a developer doesn't fall into the trap where s/he thinks the query runs on the DB but actually it's ran in memory...

  • Re: err...

    by Frans Bouma /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Heh, actually, I ran the Linq query above using Linq to Sql and I get... I myriad of queries! Not just 1, but a lot.

    This is obvious because Linq to Sql can't merge hierarchical resultsets in memory, it does that by postponing queries for hierarchical data till the parent is evaluated (the customer) and then fetches for that row the child rows.

    So, although the idea is OK, don't be fooled by the MS marketing machine: unless the end result is 1 SQL query as I described above OR 2 (one for parents and 1 for children which are merged at runtime), you WILL get a lot of queries, and this will bring down performance tremendously.

  • Re: err...

    by Francois Ward /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Indeed, even the old dataset methods had better ways of doing it than LINQ to SQL, if what you say is true (I didn't play much with LINQ to SQL, but I thought it could actually handle this particular scenario correctly... I'm probably mistaken).

    Seriously though, LINQ is getting to be pretty disapointing... LINQ to SQL is a joke (though it is nice to have around for simpler applications, being built in all). Using LINQs on collections is much more viable, but even that is inefficient as hell, and anywhere where you need performance, you're better off doing things with the extension methods on your own...

    I guess LINQ to SQL is like everything else in .NET, an entry level tool, in the same way the basic toolbox controls for winforms and asp.net are just entry level stuff, and when you get serious you roll up your owns or use third party controls like Component Art, Telerik, DevExpress... Same deal now. Use LINQ to SQL for prototypes or entry level projects, and use something like LLBLGEN when you get serious.

    At least making a LINQ provider doesn't look TOO hard.

  • Re: err...

    by Jonathan Allen /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It can be done in 1 query?
    select count(customerid) as numcustomer, country, city
    from customers
    group by country, city
    order by country asc, numcustomer desc


    That won't return all of the customer records, a goal of the above query.

    Also, you still have to group all the cities in each country.

    The downside of this particular linq construct is that the line between what's executed in the DB and what's executed in memory is blurred. Can you tell which group by is ran in the DB and which one in memory? IMHO it's better to keep things separated, so a developer doesn't fall into the trap where s/he thinks the query runs on the DB but actually it's ran in memory...


    I do believe that it is going to be a major performance problem down the road. As you pointed out, it can be hard to separate database queries from in-memory queries.

    Fortunately it is quite easy to figure out what LINQ to SQL is doing. You just need to pipe the DataContext.Log property to a stream like Console.Out.

    If you really want to make sure the query is run in-memory, you can call .ToList or .ToArray. The first query can still be in the database, but at least you know everything against the List or Array will be local.

  • Re: err...

    by Jonathan Allen /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think the correct way to handle that would be to run a single query that gets back all the information you need, and turn it into a List. Then run all your grouping queries against that in-memory list.

    That said, doing it the lazy, one-query way is going to be really, really tempting.

  • Re: err...

    by Jonathan Allen /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    There are ways to improve in-memory queries. For example, there is "i4o" for indexing in-memory objects. www.infoq.com/news/2007/04/i40-intro

  • Re: err...

    by Frans Bouma /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    That won't return all of the customer records, a goal of the above query.

    No, the initial linq query also doesn't fetch the customer rows. This is logical, as 'c' isn't in scope after the 'into', also because if you would want to fetch all the customer data, you'd have a groupby query which had to group on all customer fields.

    Also, you still have to group all the cities in each country.

    No :) You can use a loop which traverses the results in 1 go and builds the same hierarchy using an O(n) algorithm. The count per city is already there, so all you have to do is add them per country, which can be done when you loop over the rows. You just have to compare previous with current row if it is a new country, but that's it.

  • Re: err...

    by Frans Bouma /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Doing everything in memory is one way to do it, but it's still worse than O(n) because of the nature how the linq query has to be constructed: the nested grouping for the city is in the projection, so you can't execute that one AS WELL when traversing the list for grouping the city. If you want to do that internally in linq to objects, you've to analyze the whole query, which looks easy but is actually quite complex.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.