InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Enumerating Concurrent Collections

Posted by Jonathan Allen on Aug 15, 2008

Sections
Development,
Architecture & Design
Topics
.NET ,
Performance & Scalability
Tags
Parallel Programming

Parallel Extensions for .NET include two concurrent collections: a stack and a queue. Other collection classes are on their way, but there are still lingering questions about semantics. Specifically, what happens when a collection is modified by one thread while being enumerated by another thread?

For single-threaded classes the answer is well known, the enumerator throws an exception. But for concurrent classes where this is supposed to be the norm, several options exist.

One option would be to create a snapshot of the data as it exists when the enumeration is started. This would require a lot more memory during iteration, but once created this snapshot will be lock free. The performance implications of this are hard to predict.

Without snapshots, there is a set of guarantees a collection may offer to developers. Many of these are contradictory but they all appropriate under some circumstances.

  • Deleted items will always be seen
  • Deleted items will never be seen
  • Added items will always be seen if added at the end of the collection
  • Added items will always be seen if added wherever they are added
  • Added items will always never be seen
  • Moved items will never be seen twice
  • Moved items will be seen twice, if moved to the end of the collection
  • Moved items will always be seen, even if moved to the beginning of the collection
  • No more than N items will be seen, where N is the original length of the collection

Given the wide range of options, Stephen Toub is asking for feedback for Parallel Extensions:

  1. Given the standard array of collections that exist in .NET, if you had thread-safe versions of them, are there scenarios where you would benefit from being able to enumerate concurrently with other threads modifying the same collection? We already know about some, but we'd love to hear about more.
  2. Assuming you do have scenarios for #1, what are the minimum guarantees you'd need about the data returned in the enumerator for it to be useful? For example, you could say for a thread-safe dictionary that if there are no concurrent modifications (adds/updates/removals), you'll get back in the enumeration exactly what's in the dictionary, and if there are concurrent accesses, you'll never get back something that wasn't in the dictionary, you may not see concurrent adds or updates, and you may still see items that are concurrently removed.
  3. If we did support enumeration on collections that were thread-safe in the face of concurrent modifications, would you ever want the ability to revert to the "throw on modifications" behavior?
  4. Finally, what are the most important collections that you'd like to see thread-safe and scalable counterparts for?

Omer van Kloeten ha a great question about serialization,

A great addition to that would be to be able to asynchronously dump that cache from memory to persistent storage at times for backup, which would require safe enumeration. (Actually thinking about it - will serialization be thread-safe as well? What will be the default behavior there?)

Rick Brewster proposes we turn to functional programming concepts. Instead of locking the collection externally, all operations against it are performed by passing in a delegate to a "Lock(Action)" method.

The idea being that the 'Lock' method automatically surrounds your callback with a lock/unlock type of construct. This way you are also not limited to the atomic operations that the class currently exposes. In fact, maybe instead of Lock() taking an Action, it takes a Func<ConcurrentListData>, and the ConcurrentList class itself doesn't actually have any methods for get, set, enumerate. This way you can ONLY access the class within a locked scope.

No comments

Watch Thread Reply

Educational Content

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.

Questions for an Enterprise Architect

Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?