Next in our series on the API changes for .NET 6, we look at collections.
List, Stack, and Queue Capacity
Before performing a large set of inserts into a Dictionary or HashSet, it is helpful to call EnsureCapacity
with the expected collection size. This allows the collection to perform one resize operation upfront, avoiding the possibility of multiple resizes being needed.
The EnsureCapacity method has been added to the List<T>
, Stack<T>
, and Queue<T>
classes so they too can gain the performance benefit.
A notable exclusion from this group is Collection<T>. Unlike the others, Collection<T>
may optionally wrap another collection which may not necessarily expose an EnsureCapacity method. As a subclass of Collection<T>
, ObservableCollection<T>
cannot expose an EnsureCapacity
method either.
This isn’t the only design flaw in Collection<T>
and ObservableCollection<T>
. The lack of an AddRange
method has long annoyed developers. It also lacks the high performance, struct based IEnumerator
that List<T>
offers. So perhaps it’s time to offer a new alternative that doesn’t have the IList<T>
wrapping constructor and its associated problems.
Mutable Structs and Dictionaries
Though rare, occasionally developers need to work with mutable structs. This can be difficult because it is easy to accidentally copy a mutable struct, resulting in two values that are no longer kept in sync.
A work-around for this is to intentionally copy the struct, make the alterations, then copy it back into the original location. For small structs this isn’t too bad, but can be expensive for larger ones. And since performance is often cited as the reason for using mutable structs in the first place, using copy-out/copy-in is counter-productive.
To avoid these unnecessary copies, mutable structs were usually stored in arrays. Unlike the indexer property in a List<T>
, array elements are accessed directly.
In C# 7, reference return values (ref returns) were introduced. This allowed indexers to return a reference to a stuct rather than a copy.
public ref int this[int index]
{
get { return ref _internalArray[index]; }
}
This technique is used by the Span<T> struct starting in .NET Core 2.1. And as of .NET 5, the CollectionsMarshal.AsSpan method makes it easy to get a span wrapper around a collection.
In order to offer the same capabilities for dictionaries, a new function called CollectionsMarshal.GetValueRefOrNullRef was created. There are a few interesting facts about this function.
First of all, it’s not an extension method. The developers feared it would be too easy to use incorrectly, so they intentionally wanted to make it hard to find. (The AsSpan function is also potentially unsafe to use, as collection sizes cannot be modified while the Span<T>
exists.)
Another is that the function already existed as the internal method Dictionary<TKey, TValue>.FindValue
. They are merely exposing it via GetValueRefOrNullRef
.
public static ref TValue GetValueRefOrNullRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, TKey key) where TKey : notnull
=> ref dictionary.FindValue(key);
FindValue itself uses some techniques that are not common in C# programming. Here are the last few lines where you can find the liberal use of goto statements, including backwards jumping gotos.
// The chain of entries forms a loop; which means a concurrent update has happened.
// Break out of the loop and throw, rather than looping forever.
goto ConcurrentOperation;
}
}
goto ReturnNotFound;
ConcurrentOperation:
ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
ReturnFound:
ref TValue value = ref entry.value;
Return:
return ref value;
ReturnNotFound:
value = ref Unsafe.NullRef<TValue>();
goto Return;
}
After calling GetValueRefOrNullRef
, one has to use to CompilerServices.Unsafe.IsNullRef
to see if the result is a reference to an actual value or a null. This is because C# doesn’t have syntax for checking if a ref to a struct is null.
Serialization Enhancements for Dictionaries
While serializing list style collections is fairly straight forward, collections that require a hashing component such as Dictionary and HashSet pose additional challenges. One such challenge is the need to serialize not only the contents, but also the comparison algorithm used. That way the deserializer knows if it should create the collection with a comparer that is ordinal or culture-specific and case sensitive or insensitive.
Since the vast majority of these problems revolve around string-based keys, it was decided to add two new methods to StringComparer.
public static bool IsWellKnownOrdinalComparer(IEqualityComparer<string?>? comparer, out bool ignoreCase);
public static bool IsWellKnownCultureAwareComparer(IEqualityComparer<string?>? comparer, [NotNullWhen(true)] out CompareInfo? compareInfo, out CompareOptions compareOptions);
By checking these two functions, the serializer can gather the necessary information for later deserialization. While this won’t cover all scenarios, it is rare to have a custom comparer that isn’t based on well-known culture.
This isn’t, however, a complete solution. Since neither SOAP XML nor JSON allow properties to be associated with collections, the serializers will still need to determine a custom way to store the information.
Priority Queue
The new PriorityQueue class is designed for scenarios where a developer needs a queue in which items are placed in the list based on a on priority score. There are several ways this can be implemented, many of which are contingent on the question, “Can an item’s priority be changed?”.
For .NET’s PriorityQueue, it was decided to not allow priority changes. They determined that a fixed priority score would result in performance “2-3x faster” than one with a mutable priority.
In order to ensure the priority score can’t be changed, it is kept separate from the enqueued object, as shown below.
public void Enqueue(TElement element, TPriority priority);
The next design question is stability. If you add two items to the queue with the same priority, and they are guaranteed to be dequeued in the same order, then the queue is considered to be stable. .NET decided against providing a stable queue so that if an unstable queue algorithm is faster, they have the option to choose it.
Another design question is whether or not to allow the queue to be enumerated. At first glance this may seem like an odd question. Why wouldn’t you want to be able to use a for-each loop on a queue?
Well, the first problem is the implied contract for IEnumerable
. Most developers assume you can enumerate the same collection twice and get the same results both times. This assumption is baked into common patterns where it is not always obvious that enumeration is even occurring. Consider this example:
public static TElement[] CopyToArray<TElement>(this IEnumerable<TElement> source)
{
var count = source.Count();
var result = new TElement[count];
var i = 0;
foreach (var item in source)
result[i++] = item;
return result;
}
The expression source.Count()
could enumerate the items in the queue to get the count, causing it to be drained. Which in turn means the for-each loop would see an empty queue and the array wouldn’t be filled.
One could make the enumerator non-destructive, but that has its own problems. Internally, the priority queue is not necessarily ordered. This means the enumerator would have to track which items it has already seen. If you have to run multiple enumerators simultaneously for some reason, each could end up with a complete copy of the queue.
Thus, it was decided that the PriorityQueue
class wouldn’t implement IEnumerable<T>
. This in turn means it also can’t implement ICollection<T>
or other such interfaces.
If you do want to use a PriorityQueue
with a for-each loop, you can use the extension method below.
public static IEnumerable<TElement> GetDequeuingEnumerator<TElement, TPriority>(this PriorityQueue<TElement, TPriority> queue)
{
while (queue.TryDequeue(out var item))
yield return item;
}
For our previous reports in the series, see the links below: