Enumerating Concurrent Collections

Parallel Extensions for .NET include two concurrent collections: a stack and a queue. Other collection classes are on their way, but there are still lingering questions about semantics. Specifically, what happens when a collection is modified by one thread while being enumerated by another thread?

For single-threaded classes the answer is well known, the enumerator throws an exception. But for concurrent classes where this is supposed to be the norm, several options exist.

One option would be to create a snapshot of the data as it exists when the enumeration is started. This would require a lot more memory during iteration, but once created this snapshot will be lock free. The performance implications of this are hard to predict.

Without snapshots, there is a set of guarantees a collection may offer to developers. Many of these are contradictory but they all appropriate under some circumstances.

Deleted items will always be seen
Deleted items will never be seen
Added items will always be seen if added at the end of the collection
Added items will always be seen if added wherever they are added
Added items will always never be seen
Moved items will never be seen twice
Moved items will be seen twice, if moved to the end of the collection
Moved items will always be seen, even if moved to the beginning of the collection
No more than N items will be seen, where N is the original length of the collection

Given the wide range of options, Stephen Toub is asking for feedback for Parallel Extensions:

Given the standard array of collections that exist in .NET, if you had thread-safe versions of them, are there scenarios where you would benefit from being able to enumerate concurrently with other threads modifying the same collection? We already know about some, but we'd love to hear about more.

Assuming you do have scenarios for #1, what are the minimum guarantees you'd need about the data returned in the enumerator for it to be useful? For example, you could say for a thread-safe dictionary that if there are no concurrent modifications (adds/updates/removals), you'll get back in the enumeration exactly what's in the dictionary, and if there are concurrent accesses, you'll never get back something that wasn't in the dictionary, you may not see concurrent adds or updates, and you may still see items that are concurrently removed.

If we did support enumeration on collections that were thread-safe in the face of concurrent modifications, would you ever want the ability to revert to the "throw on modifications" behavior?

Finally, what are the most important collections that you'd like to see thread-safe and scalable counterparts for?

Omer van Kloeten ha a great question about serialization,

A great addition to that would be to be able to asynchronously dump that cache from memory to persistent storage at times for backup, which would require safe enumeration. (Actually thinking about it - will serialization be thread-safe as well? What will be the default behavior there?)

Rick Brewster proposes we turn to functional programming concepts. Instead of locking the collection externally, all operations against it are performed by passing in a delegate to a "Lock(Action)" method.

The idea being that the 'Lock' method automatically surrounds your callback with a lock/unlock type of construct. This way you are also not limited to the atomic operations that the class currently exposes. In fact, maybe instead of Lock() taking an Action, it takes a Func<ConcurrentListData>, and the ConcurrentList class itself doesn't actually have any methods for get, set, enumerate. This way you can ONLY access the class within a locked scope.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Parallel Programming topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter