InfoQ

News

Enumerating Concurrent Collections

Posted by Jonathan Allen on Aug 15, 2008 06:58 AM

Community
.NET
Topics
Performance & Scalability
Tags
Parallel Programming

Parallel Extensions for .NET include two concurrent collections: a stack and a queue. Other collection classes are on their way, but there are still lingering questions about semantics. Specifically, what happens when a collection is modified by one thread while being enumerated by another thread?

For single-threaded classes the answer is well known, the enumerator throws an exception. But for concurrent classes where this is supposed to be the norm, several options exist.

One option would be to create a snapshot of the data as it exists when the enumeration is started. This would require a lot more memory during iteration, but once created this snapshot will be lock free. The performance implications of this are hard to predict.

Without snapshots, there is a set of guarantees a collection may offer to developers. Many of these are contradictory but they all appropriate under some circumstances.

  • Deleted items will always be seen
  • Deleted items will never be seen
  • Added items will always be seen if added at the end of the collection
  • Added items will always be seen if added wherever they are added
  • Added items will always never be seen
  • Moved items will never be seen twice
  • Moved items will be seen twice, if moved to the end of the collection
  • Moved items will always be seen, even if moved to the beginning of the collection
  • No more than N items will be seen, where N is the original length of the collection

Given the wide range of options, Stephen Toub is asking for feedback for Parallel Extensions:

  1. Given the standard array of collections that exist in .NET, if you had thread-safe versions of them, are there scenarios where you would benefit from being able to enumerate concurrently with other threads modifying the same collection? We already know about some, but we'd love to hear about more.
  2. Assuming you do have scenarios for #1, what are the minimum guarantees you'd need about the data returned in the enumerator for it to be useful? For example, you could say for a thread-safe dictionary that if there are no concurrent modifications (adds/updates/removals), you'll get back in the enumeration exactly what's in the dictionary, and if there are concurrent accesses, you'll never get back something that wasn't in the dictionary, you may not see concurrent adds or updates, and you may still see items that are concurrently removed.
  3. If we did support enumeration on collections that were thread-safe in the face of concurrent modifications, would you ever want the ability to revert to the "throw on modifications" behavior?
  4. Finally, what are the most important collections that you'd like to see thread-safe and scalable counterparts for?

Omer van Kloeten ha a great question about serialization,

A great addition to that would be to be able to asynchronously dump that cache from memory to persistent storage at times for backup, which would require safe enumeration. (Actually thinking about it - will serialization be thread-safe as well? What will be the default behavior there?)

Rick Brewster proposes we turn to functional programming concepts. Instead of locking the collection externally, all operations against it are performed by passing in a delegate to a "Lock(Action)" method.

The idea being that the 'Lock' method automatically surrounds your callback with a lock/unlock type of construct. This way you are also not limited to the atomic operations that the class currently exposes. In fact, maybe instead of Lock() taking an Action, it takes a Func<ConcurrentListData>, and the ConcurrentList class itself doesn't actually have any methods for get, set, enumerate. This way you can ONLY access the class within a locked scope.

No comments

Watch Thread Reply

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.