BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News .NET 6 LINQ Improvements

.NET 6 LINQ Improvements

This item in japanese

Bookmarks

Continuing our series on the over 100 API changes in .NET 6, we look at extensions to the LINQ library.

Indexing Operations on IEnumerable<T>

Originally one of the distinguishing characteristics of IEnumerable<T> versus IList<T> was the latter supported indexed operations such as retrieving the 5th element in the collection. The idea was that only collections that supported fast index operations (at or near O(1)) would implement IList<T>. In theory you would never even try to perform indexed operations on an IEnumerable<T> because it would be assumed to be slow.

With the introduction of LINQ, many of those assumptions faded. Extension methods such as Enumerable.Count() and Enumerable.ElementAt() made it possible to treat any enumerable collection as if it had a known count and fast indexing, even if it really just counted every element.

These three new extension methods continue that trend.

public static TSource ElementAt<TSource>(this IEnumerable<TSource> source, Index index);
public static TSource ElementAtOrDefault<TSource>(this IEnumerable<TSource> source, Index index);
public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source, Range range);

The Range type refers the new fairly new C# range syntax. For example,

var elements = source2.Take(range: 10..^10)

Counting Operations on IEnumerable<T>

When one calls .Count() on an IEnumerable<T>, two things happen. First, the LINQ library tries to cast it to an interface that exposes a Count property. If it can’t do that, then it iterates through the entire sequence, counting the items as it goes.

For large collections, this counting can be very expensive. If an IQueryable is involved, say for a database query, it could conceivably take minutes or longer to get the count. So developers have asked for a “safe” counting function. This function would check for a fast Count property, and if it can’t find one then it would return nothing. This new function is defined as:

public static bool TryGetNonEnumeratedCount(this IEnumerable<T> source, out int count);

The long name, TryGetNonEnumeratedCount, is a bit of hint to the developer that they are doing something odd. Ideally, any API that returns a list would return either a strongly named collection (e.g. CustomerCollection), a high performance List<T>, or a higher level interface such as IList<T> or IReadOnlyList<T>. Doing any of these would eliminate the need for TryGetNonEnumeratedCount, but sometimes the developer doesn’t have control over the API they are calling.

Three-way Zip Extension Method

The Zip extension method combines two sequences by enumerating both simultaneously. For example, if you had the list 1, 2, 3 and the list A, B, C, then the resulting sequence would be (1, A), (2, B), (3, C).

In the More arities for tuple returning zip extension method proposal, you will gain the ability to directly combine three sequences at once.

public static IEnumerable<(TFirst First, TSecond Second, TThird Third)> Zip<TFirst, TSecond, TThird>(this IEnumerable<TFirst> first, IEnumerable<TSecond> second, IEnumerable<TThird> third);
public static IQueryable<(TFirst First, TSecond Second, TThird Third)> Zip<TFirst, TSecond, TThird>(this IQueryable<TFirst> source1, IEnumerable<TSecond> source2, IEnumerable<TThird> source3);

Strictly speaking, this new version of Zip isn’t necessary as one could simply apply .Zip(…) multiple times. But the reviewers decided the three-sequence version happens often enough to justify including it in the library. Other arities such as 4 or 5 were rejected as not being common enough.

Note: The usual word arity (plural arities) means the number of arguments or operands taken by a function. In the Zip function extension method, the arity would be 3.

Batching Sequences

Often developers need to break up a sequence into discrete batches or chunks. For example, they may find that sending 100 rows to the database at a time offers better performance than sending one at a time or all of them at once.

While this code isn’t particularly difficult to write, it tends to be error prone. It’s really easy to make a mistake in the last batch if the item count isn’t evenly divisible by the batch size. So as a convenience, the Chunk extension method was added.

public static IEnumerable<T[]> Chunk(this IEnumerable<T> source, int size);
public static IQueryable<T[]> Chunk(this IQueryable<T> source, int size);

If you don’t want to wait for .NET 6, the open source library MoreLINQ includes this operation under the name Batch.

Note that in this feature, and the Zip extension method above, there is both a IEnumerable and IQueryable version. All new APIs that return an IEnumerable are required to include a matching IQueryable version. This prevents someone from changing a query into a normal sequence without realizing it.

Analyzer Checks

When the API itself can’t prevent developers from using code incorrectly, library authors are increasingly turning to analyzers. Some of these are built into the C# compiler, others are added via libraries such as NetAnalyzers (formally FXCop) and the 3rd party Roslynator.

The first of the new analyzers deals with the OfType<T> extension method. This filters an input sequence, only returning items of the indicated type T. If the input type cannot be cast to the output type, the current behavior is to just return an empty sequence. With the do not use OfType<T>() with impossible types proposal, the developer will instead get a compiler warning.

The use AsParallel() correctly proposal addresses the situation where someone calls AsParallel() and then immediately begins to enumerate the sequence. Though it isn’t obvious from the API, AsParallel() must appear before any operations that can be parallelized such as mapping and filtering. Again, this mistake would be reported as a compiler warning.

*By Operators

The *By operators refer to DistinctBy, ExceptBy, IntersectBy, UnionBy¸ MinBy, and MaxBy. For the first four, a keySelector is provided. This allows the comparison part of the operation to be performed on a subset of the value rather than the entire value. This can be used to improve performance or to provide custom behavior without losing the original value. An optional comparer may also be provided.

In the case of MinBy and MaxBy, a selector is provided instead of a keySelector. Again, an optional comparer may be provided. (For completeness, the Min and Max operators also accept an optional comparer now.)

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);

public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst, IEqualityComparer<TKey>? comparer);

public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst);
public static IEnumerable<TSource> IntersectBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TKey> second, Func<TSource, TKey> keySelectorFirst, IEqualityComparer<TKey>? comparer);

public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector);
public static IEnumerable<TSource> UnionBy<TSource, TKey>(this IEnumerable<TSource> first, IEnumerable<TSource> second, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? comparer);

public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static TSource MinBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);

public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector);
public static TSource MaxBy<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector, IComparer<TResult>? comparer);

public static TResult Min<TSource, TResult>(this IEnumerable<TSource> source, IComparer<TResult>? comparer);

public static TResult Max<TSource, TResult>(this IEnumerable<TSource> source, IComparer<TResult>? comparer);

Each IEnumerable method has a matching IQueryable method with the same signature.

*OrDefault Enhancement

The *OrDefault operator variant is used to provide a default value when an empty enumeration is sent to the Single, First, or Last operator. In this feature, the default value returned can now be overridden.

public static TSource SingleOrDefault<TSource>(this IEnumerable<TSource> source, TSource defaultValue);
public static TSource SingleOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate, TSource defaultValue);

public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, TSource defaultValue);
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate, TSource defaultValue);

public static TSource LastOrDefault<TSource>(this IEnumerable<TSource> source, TSource defaultValue);
public static TSource LastOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate, TSource defaultValue);

Again, each IEnumerable method has a matching IQueryable method with the same signature.

For our previous reports in the series, see the links below:

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Batching Sequences

    by Philip Lee,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    System.Interactive already has the Buffer extension method (borrowed from RX) which is exactly the same as Chunk. The big improvement in C# 6 is the implementation for IQueryable.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT