More on Immutable Collections in .NET
Since we last reported on Immutable Collections in January, the API has evolved and a lot more has been revealed about the inner workings. First a rundown of what’s changed in the most recent release:
Though the immutable collections still do not offer constructors, the use of an Empty object is no longer necessary. Previously you would see code like this:
var list = ImmutableList<int>.Empty.Add(1, 2, 3);
With the new release we get a static factory method called Create. This allows generic type inference to be used, shortening the expression to:
var list = ImmutableList.Create(1, 2, 3);
A hotly debated topic is the implementation of the IList<T> interface. Proponents of the interface say that it is necessary for interoperability with libraries that predate the introduction of IReadOnlyList<T>. Critics complain that the same legacy libraries don’t necessarily check to see if IList.IsReadOnly is false before attempting to modify values.
In the end the BCL team bowed down to legacy concerns and implemented IList<T>. While everyone agrees that it would have been better if IList.IsReadOnly never existed, at this point there is too much momentum behind it.
For a complete list of immutable classes and the interfaces they expose see this compatibility chart.
Like other collection types, the immutable collections will only support reference equality. The BCL Team writes,
Value equality on collections can be fairly expensive to compute and comparing for equality on nested collections, such as ImmutableDictionary<string, ImmutableList<string>> is more difficult to define. Finally, providing this functionality leads to more issues when different comparers are involved, as customers have pointed out.
Previously the collections did override Object.Equals but not op_equals.
Some have asked about supporting IStructuralEquatable. The BCL team has declined to support that interface because it is “hard to generalize”. For example, in some scenarios it may be appropriate to skip over items in the collection (e.g. whitespace nodes in a parser), which wouldn’t be possible with a non-specific implementation.
Unfortunately the design of the immutable classes requires them to be sealed, preventing the use of inheritance to add IStructuralEquatable after the fact.
The immutable collections library is designed for .NET 4.5 and later. It is designed to take advantage of the new read-only interfaces and the developers are not interested in maintaining a separate version for older libraries. It is also available for the Windows 8 and the “portable-net45+win8” profiles.
The legacy serialization design using the Serializable attribute will not be supported by the immutable collections. At this time there is no word on whether or not other serialization designs such as DataContractSerializer will be supported.
Immutable collections (except stack and queue) are based on AVL trees. This allows insertions in the beginning, middle, or end of the list without recopying the entire tree. In the trees section of the Wikipedia article on persistent data structures you can see an example of such an insert.
Immutable hash tables also use AVL trees. Instead of the bucket design of normal hash tables, which perform a modulus operation on the hash code, these actually sort the tree according to the raw hash code. This means retrieval requires a binary search with an average retrieval time of O(log n).
Keep in mind that the Big-O notation is misleading when using multi-threaded operations. The alternative to the immutable collections are the concurrent collections, which require expensive internal locking to ensure thread safety.
An interesting feature of the immutable collections is that their internal nodes are not immutable. In order to reduce the amount of garbage created while constructing a collection, each node begins in an editable state. This allows the constructor to make changes to the existing AVL tree as it adds nodes instead of discarding and recreating it. Once construction is complete and the immutable wrapper returned the nodes are marked as frozen, preventing further changes.
Another surprising design decision is the object pool used by the enumerator. In .NET many of the enumerators are designed to be allocation free. If you get an enumerator from an IList<T> two allocations are needed. But with a List<T>, the enumerator is a struct and no allocations are necessary.
Likewise, the immutable collections use structs for the enumerator. But because the internal structure is a tree, the enumerator needs a stack to include a stack of previously visited nodes so that it can back-track. In order to reduce allocations, a set of these stacks is stored in an object pool (actually a stack) protected by a single lock. In fact, this is the only lock in the entire immutable collection library. It is critical that dispose is called on the enumerator, or else the stack will not be returned to the object pool.
For more information see the Channel 9 video titled Inner Workings of Immutable Collections.
When creating an immutable collection, it is best to create the collection all at once using the Create function. This will allow it to pre-allocate the tree and directly populate the nodes. The second-best method is to use a builder, which doesn’t freeze nodes until you call ToImmutable.
When enumerating items in an immutable collection, always prefer a foreach loop. Due to the internal tree structure, this will be much faster than a for loop. (Side note: Since .NET 2.0, even normal lists can be read faster with foreach than with for.)
If a collection is never mutated after being created, then an immutable collection will have poorer performance than a normal collection protected by a read-only wrapper. Immutable collections are better when you want to efficiently create new collections that are slightly different than another collection.
John Krewson, Steve Ropa and Matt Badgley Nov 24, 2014