An aggregate is a function that takes a collection of values and returns a scalar value. Examples from T-SQL include min, max, and sum. Both VB and C# have support for aggregates, but in very different ways.
Both VB and C# support aggregates as extension methods. Using the dot-notation, one simply calls a method on an IEnumerable object. For example
var totalVirtualMemory =
(from p in Process.GetProcesses()select p.VirtualMemorySize64).Sum();
Dim totalVirtualMemory = _
(From p In Process.GetProcesses _
Select p.VirtualMemorySize64).Sum
As you can see, the VB and C# versions are nearly identical. VB also exposes a LINQ syntax specifically for aggregates.
Dim totalVirtualMemory = Aggregate p In Process.GetProcesses _
Into p.VirtualMemorySize64
If this were the only difference, there wouldn't be anything to talk about. But things get interesting when you want to operate on more than one "column" at a time. For the sake of example, let's say you were interesting in both the total virtual memory and total working set (physical memory) currently in use.
Using anonymous classes, you could easily create one variable with both those values.
var totals = new
{
totalVirtualMemory = (from p in Process.GetProcesses()select p.VirtualMemorySize64).Sum(),
totalWorkingSet = (from p in Process.GetProcesses()select p.WorkingSet64).Sum()
};
The problem with this is that GetProcesses() is called twice. That means the OS has to be queried twice and two loops through the resulting collection. A faster way would be to cache the call to GetProcesses().
var processes = (from p in Process.GetProcesses()
select new { p.VirtualMemorySize64, p.WorkingSet64 }).ToList();
var totals2 = new
{
totalVirtualMemory = (from p in processes
select p.VirtualMemorySize64).Sum(),
totalWorkingSet = (from p in processes
select p.WorkingSet64).Sum()
};
While closer, there are still two loops through the collection. To fix this, a custom aggregator is needed, as well as a named class to hold the results.
public static ProcessTotals Sum(this IEnumerable source)
{
var totals = new ProcessTotals();
foreach (var p in source){
totals.VirtualMemorySize64 += p.VirtualMemorySize64;
totals.WorkingSet64 += p.WorkingSet64;
}
return totals;
}
public class ProcessTotals
{
public long VirtualMemorySize64 { get; set; }
public long WorkingSet64 { get; set; }
}
var totals3 = (from p in Process.GetProcesses() select p).Sum();
A developer could do the same thing in Visual Basic, but there is another option.
Dim totals3 = Aggregate p In Process.GetProcesses _
Into virtualMemory = Sum(p.VirtualMemorySize64), _
workingSet = Sum(p.WorkingSet64)
Just like in the last C# example, you end up with a variable that has two fields. But unlike the C# example, you do not have the tradeoff between creating your own aggregate function and class or wasting cycles looping through the collection twice.
To be fair, C# does still have one more trick up its sleeve. Unlike VB, which only supports single-line anonymous functions, C# can make them as complex as necessary. This gives it the ability to create anonymous aggregate functions when needed.
var processes =
(from p in Process.GetProcesses()
select new { p.VirtualMemorySize64, p.WorkingSet64 });
var totals4 = processes.Aggregate(new ProcessTotals(), (sum, p) =>
{
sum.WorkingSet64 += p.WorkingSet64;
sum.VirtualMemorySize64 += p.VirtualMemorySize64;
return sum;});
Note that the class ProcessTotals is still needed. An anonymous class cannot be used here because C# anonymous classes are immutable. Visual Basic allows for mutable anonymous classes, but that does not help here because VB cannot create the multi-line anonymous function.
While Visual Basic and C# have significantly more power than before, each illustrates areas where the other can be improved.
Community comments
F# version and a few comments...
by Tomas Petricek,
Re: F# version and a few comments...
by h b,
LINQ Agggregates & Resolving Relationships pitfalls & solutions
by Hartmut Wilms,
The core point of this article is wrong
by Stefan Wenig,
F# version and a few comments...
by Tomas Petricek,
Your message is awaiting moderation. Thank you for participating in the discussion.
Hi, just for a comparison, here is an F# version of the same thing:
It is interesting, that even though C# and VB provide same 'functional' approach it still takes about 4 times more lines of code :-).
However, your C# sample can be simplified a bit as well, because you can in fact use anonymous classes like this (I have not tried it, but I think it shoudl be correct):
I would just like to comment the last sentence in the article:
>> While Visual Basic and C# have significantly more power than before, each illustrates areas where the other can be improved.
I think that the VB query syntax is quite interesting from the VB developer perspective, because VB tries to include everything that the develoeprs may ever needed in the language - and the VB queries are very easily usable thanks to very good IDE support.
On the other side, I very well understand people who say that including queries (as ad-hoc feature) in C# is a bad thing, because C# shouldn't contain a feature that isn't useful in more general way - even though you can use the C# queries for querying anything, they are still useful only as queries which is not as wide as it could be.
I'm still not decided about this, but I would say that having a larger difference between C# and VB wouldn't be a bad thing - I think that profiling C# more as a language for developing .NET libraries (and of course LINQ providers) and VB as a language for doing RAD would make some sense...
LINQ Agggregates & Resolving Relationships pitfalls & solutions
by Hartmut Wilms,
Your message is awaiting moderation. Thank you for participating in the discussion.
Ian Griffiths has a very good post on LINQ Aggregates and the very convenient LINQ feature to automatically resolve relationships on his blog: LINQ to SQL, Aggregates, EntitySet, and Quantum Mechanics:
The critical point here is to know what LINQ is doing for you. The ability to follow relationships between tables using simple property syntax in C# can simplify some code considerably. With appropriately-formed queries, LINQ to SQL’s smart transformation from C# to SQL will generate efficient queries when you use these relationships. But if you’re not aware of what the LINQ operations you’re using do, it’s easy to cause big performance problems.
As always, a sound understanding of your tools is essential.
Re: F# version and a few comments...
by h b,
Your message is awaiting moderation. Thank you for participating in the discussion.
Hi,
I found myself writing the same code as you suggest but I realized that it don't compile.
Actually, the compiler can't figure if the inferred anonymous class within the processes enumerator is the same the one returned by the lambda expression of Aggregate.
Hopefully, I already have a generic implementation of tuples. this is what seems to be done by F#.
BTW, I have implemented most of the functional basics: map, reduce(or fold)... so I neither use the syntactic sugar of c# linq expressions nor the linq functions.
Using my own methods I can write:
var z = F.map_reduce(
Process.GetProcesses()
, (x) => F.tupple(x.VirtualMemorySize64, x.WorkingSet64) //map to tupple
, (x, y) => F.tupple(x._1 + y._1, x._2 + y._2) //reduction
, F.tupple((long)0, (long)0)); //initial value
The only thing I can't do with C# is monad composition. But I agree, FP with C# is a little verbose...
The core point of this article is wrong
by Stefan Wenig,
Your message is awaiting moderation. Thank you for participating in the discussion.
While VB does have syntactic sugar for aggregation, it is not true that it avoids iterating the source for every aggregation.
This code:
becomes this (in C#, this would be totally unreadable in VB):
Both invocations of Sum would iterate the list.
However, since GetProcesses() returns a list, there is no significant cost associated with iterating it twice. This would only be an issue when you're iterating lists that cannot easily be held in memory. (But it does not apply to LINQ to RDBMS, where the heavy lifting is done via generated SQL set operations.)
So this leaves us with a very narrow field of cases where multiple iterations should be avoided. In this case, h b's method would be the way to go. This can even be expressed using standard LINQ operators, for those who don't feel the need to create their own FP library. (I don't see the need for map_reduce, reduce is sufficient here, and it's called Aggregate in LINQ.)
No need for ProcessTotals, type inferencing works perfectly well for anonymous types here. Tuples would have the advantage of less typing and less possible typos, but the code and the are ugly and unintuitive.