Key takeaways
|
Donald Knuth famously said, “We should forget about small efficiencies, say about 97% of the time”. But when faced with the other 3%, it is good to know what’s going on behind the scenes. So in this article we’ll be taking a dive into the foreach loop.
A common misconception is the foreach loop in C# operates on IEnumerable. That is almost correct, but it actually operates on anything that looks like an IEnumerable. That means it must have a GetEnumerator method and that method must return an object (or struct) with Current and MoveNext methods, the latter of which returns a Boolean.
This was necessary back in the .NET 1.x era when we didn’t have generics or IEnumerable<T>. If you used a non-generic IEnumerable to loop over an integer array, it would have to allocate a new object for each item in the array (an operation known as boxing). As that would be ridiculously expensive, they decided C# would look for a custom enumerator first, and if it couldn’t find one then it would fall back on IEnumerable.GetEnumerator
.
Performance Testing foreach
This means we can obtain some surprising gains depending on how we declare our variables.
Here is our starting point, a simple loop that calculates a sum.
private static void Test1(IEnumerable list)
{
var sw = Stopwatch.StartNew();
var count = 0;
foreach (var item in list)
count += item;
sw.Stop();
Console.WriteLine("IEnumerable: " + sw.Elapsed.TotalMilliseconds.ToString("N2"));
}
So how long does this take for a list of 100 million? Using BenchmarkDotNet on my laptop we’re looking at an average of 870.4265 ms (12.4169 ms StdDev). Not bad given the amount of data, but what if we used IList
instead of IEnumerable
?
private static void Test2(IList<int> list)
With no other changes, this gives us an average run time of 870.7999 ms (11.9365 ms StdDev). So close that given the margin of error you could say it was exactly the same. And it makes sense because either way we are getting the same enumerator. But what if we were to use a concrete class instead?
private static void Test3(List<int> list)
Again we’re testing on a noisy laptop, so some variance is to be expected. But surprisingly, I am getting an average runtime of 284.4808 ( 4.1080 ms StdDev). How do we account for a better than 67% improvement in runtime performance?
List<int>.GetEnumerator doesn’t return a IEnumerable<int>
As I mentioned in the beginning, a foreach loop only needs something that looks like an IEnumerable. This means you can use tricks otherwise be unavailable to you. For example, you can return a value type instead of an object as your enumerator. This is a small gain, but it does mean one less piece of memory to allocate now and garbage collect later.
More importantly, since we are using a struct instead of reference type or interface we gain the ability to use a normal function call instead of a virtual function call. With a normal call, you simply push your parameters onto the stack and jump to the right instruction. With a virtual call, you must first locate the correct virtual table (v-table) for the interface you are using. Then you find the function you want to call in that v-table, after which you can proceed as if it were a normal call.
Note: In most statically typed, OOP style languages, every class has multiple interfaces. There is the public interface, one or more base class interfaces, and any relevant abstract interfaces. Each of these gets its own v-table for looking up methods marked as abstract or virtual (hence the name v-table). Depending on how the compiler and/or runtime is designed, the protected and internal interfaces either get their own v-table or share with the public interface.
If you build your application with optimizations turned on (a.k.a. a release build), another factor comes into play. Because it knows exactly where the normal function is during the JIT process, the compiler has the option to skip the ceremony of a function call entirely and just copy the contents of the function you are calling into your function.
This is known as “inlining” and is very important for the optimization process. When I said, “you simply push your parameters onto the stack”, it’s a bit of an exaggeration. You also have to push the CPU registers and the return address onto the stack for when the function is complete. With inlining, those extra steps are not necessary. Furthermore, with everything now in one place the optimizer may see other ways to improve the code.
Note: Some runtimes, such as Java’s Hotspot VM can actually inline virtual methods if it thinks the method being called is always from the same concrete class. It is a rather advanced technique and requires the ability to “deoptimize” the code if it detects something has changed and the initial criteria no longer apply.
Disassembling the code:
We now turn our attention the disassembled IL code for the three tests.
Test 1: IEnumerable<int>
IL_0000: call class [System]System.Diagnostics.Stopwatch [System]System.Diagnostics.Stopwatch::StartNew() IL_0005: stloc.0 IL_0006: ldc.i4.0 IL_0007: stloc.1 IL_0008: ldarg.0 IL_0009: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1 class [mscorlib]System.Collections.Generic.IEnumerable`1::GetEnumerator() IL_000e: stloc.2 .try { IL_000f: br.s IL_001c IL_0011: ldloc.2 IL_0012: callvirt instance !0 class [mscorlib]System.Collections.Generic.IEnumerator`1 !0>::get_Current() IL_0017: stloc.3 IL_0018: ldloc.1 IL_0019: ldloc.3 IL_001a: add IL_001b: stloc.1 IL_001c: ldloc.2 IL_001d: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext() IL_0022: brtrue.s IL_0011 IL_0024: leave.s IL_0030 } // end .try finally { IL_0026: ldloc.2 IL_0027: brfalse.s IL_002f IL_0029: ldloc.2 IL_002a: callvirt instance void [mscorlib]System.IDisposable::Dispose() IL_002f: endfinally } // end handler IL_0030: ldloc.0 IL_0031: callvirt instance void [System]System.Diagnostics.Stopwatch::Stop()
Test 2: IList<int>
IL_0000: call class [System]System.Diagnostics.Stopwatch [System]System.Diagnostics.Stopwatch::StartNew() IL_0005: stloc.0 IL_0006: ldc.i4.0 IL_0007: stloc.1 IL_0008: ldarg.0 IL_0009: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1!0>!0> class [mscorlib]System.Collections.Generic.IEnumerable`1 ::GetEnumerator() IL_000e: stloc.2 .try { IL_000f: br.s IL_001c IL_0011: ldloc.2 IL_0012: callvirt instance !0 class [mscorlib]System.Collections.Generic.IEnumerator`1 ::get_Current() IL_0017: stloc.3 IL_0018: ldloc.1 IL_0019: ldloc.3 IL_001a: add IL_001b: stloc.1 IL_001c: ldloc.2 IL_001d: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext() IL_0022: brtrue.s IL_0011 IL_0024: leave.s IL_0030 } // end .try finally { IL_0026: ldloc.2 IL_0027: brfalse.s IL_002f IL_0029: ldloc.2 IL_002a: callvirt instance void [mscorlib]System.IDisposable::Dispose() IL_002f: endfinally } // end handler IL_0030: ldloc.0 IL_0031: callvirt instance void [System]System.Diagnostics.Stopwatch::Stop()
Test 3: List<int>
IL_0000: call class [System]System.Diagnostics.Stopwatch [System]System.Diagnostics.Stopwatch::StartNew() IL_0005: stloc.0 IL_0006: ldc.i4.0 IL_0007: stloc.1 IL_0008: ldarg.0 IL_0009: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator!0>!0>!0> class [mscorlib]System.Collections.Generic.List`1 ::GetEnumerator() IL_000e: stloc.2 .try { IL_000f: br.s IL_001d IL_0011: ldloca.s V_2 IL_0013: call instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator ::get_Current() IL_0018: stloc.3 IL_0019: ldloc.1 IL_001a: ldloc.3 IL_001b: add IL_001c: stloc.1 IL_001d: ldloca.s V_2 IL_001f: call instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator ::MoveNext() IL_0024: brtrue.s IL_0011 IL_0026: leave.s IL_0036 } // end .try finally { IL_0028: ldloca.s V_2 IL_002a: constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator IL_0030: callvirt instance void [mscorlib]System.IDisposable::Dispose() IL_0035: endfinally } // end handler IL_0036: ldloc.0 IL_0037: callvirt instance void [System]System.Diagnostics.Stopwatch::Stop()
As you look at this, it is pretty easy to spot the calls to methods on structures. They always use call instead of callvirt. But there is more going on here to be aware of. Consider line IL_0009 in test 3. We said earlier this wouldn’t be a virtual call, but it is in fact using callvirt.
This is where IL code can be a bit misleading. The callvirt operator doesn’t exactly mean “make a virtual call through a v-table”. Here is the breakdown:
|
call |
callvirt |
May be used on reference types |
yes |
yes |
May be used on value types |
yes |
only if constrained |
May be used on abstract interfaces |
no |
yes |
May call an instance (i.e. non-virtual) method |
yes |
yes |
May call a virtual method |
yes |
yes |
May call a static method |
yes |
No |
Selects method based on |
compile-time class |
runtime object’s type |
Checks for null |
no |
yes |
So basically, C# emits a callvirt
instead of a call so the null reference check is performed. It then trusts the JIT compiler to recognize when a method isn’t actually virtual, eliminate the v-table lookup, and possibly inline the code. This also has a backwards compatibility aspect. If an instance method is changed into a virtual method, the code that calls it won’t have to be recompiled.
Note: IL code allows you to use call
on a virtual method in order to invoke a base class method (e.g. base.Foo()
) when overriding a method without having to introduce a specialized op-code.
For loops
So we know why using a foreach over a List<int>
is faster than the same list wrapped in an IList<int>
or IEnumerable<int>
. But what if we were to use a for loop instead of a foreach?
private static void Test4(IList list)
{
var sw = Stopwatch.StartNew();
var count = 0;
for (int i = 0; i < list.Count; i++)
count += list[i];
sw.Stop();
Console.WriteLine("List for: " + sw.Elapsed.TotalMilliseconds.ToString("N2"));
}
!0>!0>!0>
This time we’re seeing an average runtime of 698.9598 ms (24.5606 ms StdDev). Slower than our foreach+List case, but still 20% faster than when we started.
private static void Test5(List<int> list)
Testing it again with a for loop over a List<int>, we see the numbers drop by an order of magnitude. With an average runtime of 105.6321 ms (1.1690 ms StdDev), we’re 63% faster than our next best case and 88% faster than our worst case.
What about LINQ?
For our next set of tests, we’ll only be counting the even numbered rows. Common knowledge is LINQ is slower than normal loops because it uses delegates, but is that true?
We ran theses four tests with an 1000 item list:
- IList<int> with LINQ Where clause: list.Where(x => x % 2 == 0)
- IList<int> with an if-statement: if (item % 2 == 0)
- List with LINQ Where clause
- List<int> with an if-statement
Common wisdom is that LINQ is slower than if statements, but that’s not always the case.
Variable |
Conditional |
Mean ms |
StdDev ms |
IList<int> |
LINQ |
868.1882 |
9.2918 |
IList<int> |
if-statement |
911.8603 |
7.4000 |
List<int> |
LINQ |
871.5804 |
12.1582 |
List<int> |
if-statement |
319.2920 |
12.7303 |
The slowest turned out to be IList<int> with an if-statement . When using LINQ, performance actually improved by 4 to 5%. But if we use a List<int> with an if-statement , then we see a 65% improvement of the worst case.
So when choosing between using LINQ and an if-statement when using interface-typed variables, LINQ is the better choice. Why? In part because it discards the interface and goes back to the base class. Here is LINQ’s Where function as shown in Reference Source.
public static IEnumerable Where(this IEnumerable source, Func predicate) {
if (source == null) throw Error.ArgumentNull("source");
if (predicate == null) throw Error.ArgumentNull("predicate");
if (source is Iterator) return ((Iterator)source).Where(predicate);
if (source is TSource[]) return new WhereArrayIterator((TSource[])source, predicate);
if (source is List) return new WhereListIterator((List)source, predicate);
return new WhereEnumerableIterator(source, predicate);
}
!0>!0>!0>
As you can see, it has special cases for several concrete types including List<T>. In fact, this function had no idea we were trying to give it an IList<T>. That information was discarded and then time was spent to determine what the real type was. Hence the reason both LINQ versions have the same runtime characteristics.
This is actually fairly common in the .NET Framework. A common misconception is you can cast a list to IEnumerable<T> or IReadOnlyList<T> and that actually makes the object read-only. In fact, libraries such as LINQ and frameworks such as WPF sniff out the real types underneath and respond accordingly.
Strongly Named Classes
A recommendation in the .NET Framework Design Guidelines is all collections exposed in a public API should be strongly named. For example, instead of exposing a generic IList<Customer>
, you should have a CustomerCollection
class.
The primary reason for this rule is it allows for future extensibility. You can later decide to expose additional properties and methods on the strongly named class in a way consumers can easily find. It also solves the problems caused by IList<T>
not inheriting from IReadOnlyList<T>
.
As of .NET 2.0, the recommended base class for strongly named collections is Collection<T>. This makes for a good look at trades-offs when designing an API. For example, Collection<T>
has overridable methods that are fired when objects are added or removed from the collection. This may make the class slower when calling add and remove methods, but it allows you to add interesting features such as validation.
Unlike List<T>
, Collection<T>
does not expose a struct based enumerator. So enumerating it with a foreach loop will be no different than exposing IEnumerable<T>
. But you do get something in exchange. Consider the constructor for Collection<T>
.
public Collection(IList list) {
if (list == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.list);
}
items = list;
}
Rather than copying the source list, Collection<T> will use it as-is. This makes converting a List<T> into a Collection<T> very cheap, which is useful when you want to add additional functionality to a populated list before exposing it to consumers.
Immutable vs Read-Only Collections
Like Collection<T>
, ReadOnlyCollection<T>
is just a wrapper around another list. So you should see the same runtime performance characteristics. But what about the immutable collections: ImmutableList<T>
and ImmutableArray<T>
?
In a test of 1,000 integers we saw these averages. (Note that the time is in micro-seconds.)
Class |
For (us) |
For-Each (us) |
int[] |
0.647 |
0.517 |
List<int> |
1.4776 |
2.8544 |
Collection<int> |
6.9078 |
8.2498 |
ReadOnlyCollection<int> |
6.9520 |
8.2494 |
ImmutableArray<int> |
0.954 |
2.7449 |
ImmutableList<int> |
50.6617 |
62.3762 |
Performance wise, an ImmutableArray<T> is clearly a substitute for List<T>. And ReadOnlyCollection<T> is about the same speed as Collection<T>. But what’s up with ImmutableList<T>?
The source code isn’t available on Reference Source, but if you understand the concept behind the immutable list it starts to make sense. While ReadOnlyCollection<T> and ImmutableArray<T> are merely wrappers around an IList<T> and an array respectively, ImmutableList<T> is a rather complex tree structure. Here is a decompiled version of the enumerator for ImmutableList<T>.
public bool MoveNext()
{
this.ThrowIfDisposed();
this.ThrowIfChanged();
if (this._stack != null)
{
Stack.Node>> stack = this._stack.Use.Enumerator>(ref (ImmutableList.Enumerator) ref this);
if ((this._remainingCount > 0) && (stack.get_Count() > 0))
{
ImmutableList.Node node = stack.Pop().Value;
this._current = node;
this.PushNext(this.NextBranch(node));
this._remainingCount--;
return true;
}
}
this._current = null;
return false;
}
!0>!0>!0>
The ImmutableList<T>
is useful, but only for very specific use cases beyond the scope of this article.
Conclusions
If you want convenience or the flexibility to make future changes, expose a strongly named type that inherits from Collection<T>
or ReadOnlyCollection<T>
. If performance is a more important concern, design your API to use arrays, List<T>
or ImmutableArray<T>
.
Avoid IEnumerable<T>
, IList<T>
, and IReadOnlyList<T>
for property and method return types if you can, and don’t use ImmutableList<T> unless you are actively manipulating immutable collections.
The source code for this article is available at https://github.com/Grauenwolf/ForEachTimings.
About the Author
Jonathan Allen got his start working on MIS projects for a health clinic in the late 90's, bringing them up from Access and Excel to an enterprise solution by degrees. After spending five years writing automated trading systems for the financial sector, he became a consultant on a variety of projects including the UI for a robotic warehouse, the middle tier for cancer research software, and the big data needs of a major real estate insurance company. In his free time he enjoys studying and writing about martial arts from the 16th century.