InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

On the Evolution of the .NET Collections

Posted by Jonathan Allen on Jun 05, 2008

Sections
Architecture & Design,
Development,
Operations & Infrastructure
Topics
Programming ,
.NET ,
.NET Framework ,
Open Source

The collections in the .NET Framework have evolved significantly over the years. Taking advantage of Microsoft's new found openness, we show two versions of a familiar data structure, the hash table, in both .NET and Mono.

In theory the hash table is a rather simple construct, just collection of arrays or linked lists divided into a finite number of buckets. Yet even the act of retrieving an item can be rather complex if you try to be too clever with multi-threading logic.

//   Copyright (c) Microsoft Corporation.  All rights reserved.

public virtual Object this[Object key] {
         get {
             if (key == null) {
                 throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));
             }
             uint seed;
             uint incr;

             // Take a snapshot of buckets, in case another thread does a resize
             bucket[] lbuckets = buckets;
             uint hashcode = InitHash(key, lbuckets.Length, out seed, out incr);
             int  ntry = 0;
             bucket b;
             int bucketNumber = (int) (seed % (uint)lbuckets.Length);
             do
             {
                 int currentversion;
                 //     A read operation on hashtable has three steps:
                 //        (1) calculate the hash and find the slot number.
                 //        (2) compare the hashcode, if equal, go to step 3. Otherwise end.
                 //        (3) compare the key, if equal, go to step 4. Otherwise end.
                 //        (4) return the value contained in the bucket.
                 //     After step 3 and before step 4. A writer can kick in a remove the old item and add a new one
                 //     in the same bukcet. So in the reader we need to check if the hash table is modified during above steps.
                 //
                 // Writers (Insert, Remove, Clear) will set 'isWriterInProgress' flag before it starts modifying
                 // the hashtable and will ckear the flag when it is done.  When the flag is cleared, the 'version'
                 // will be increased.  We will repeat the reading if a writer is in progress or done with the modification
                 // during the read.
                 //
                 // Our memory model guarantee if we pick up the change in bucket from another processor,
                 // we will see the 'isWriterProgress' flag to be true or 'version' is changed in the reader.
                 //
                 int spinCount = 0;
                 do {
                     // this is violate read, following memory accesses can not be moved ahead of it.
                     currentversion = version;
                     b = lbuckets[bucketNumber];

                     // The contention between reader and writer shouldn't happen frequently.
                     // But just in case this will burn CPU, yield the control of CPU if we spinned a few times.
                     // 8 is just a random number I pick.
                     if( (++spinCount) % 8 == 0 ) {
                         Thread.Sleep(1);   // 1 means we are yeilding control to all threads, including low-priority ones.
                     }
                 } while ( isWriterInProgress || (currentversion != version) );
                 if (b.key == null) {
                     return null;
                 }
                 if (((b.hash_coll & 0x7FFFFFFF) == hashcode) &&
                     KeyEquals (b.key, key))
                     return b.val;
                 bucketNumber = (int) (((long)bucketNumber + incr)% (uint)lbuckets.Length);
             } while (b.hash_coll < 0 && ++ntry < lbuckets.Length);
             return null;
         }

Yes folks, there is a pseudo-spin lock and call to Threading.Sleep in there.

With .NET 2.0 and generic collections, Microsoft decided to dump the internal locking mechanisms for collections. Instead, any locking must be applied externally. This gives us the much simpler code found in the Generic.Dictionary class.

//   Copyright (c) Microsoft Corporation.  All rights reserved.

private int FindEntry(TKey key) {
    if( key == null) {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    }
    if (buckets != null) {
        int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
        for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
            if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
        }
    }
    return -1;
}

The Mono versions of the Hashtable and Dictionary are likewise very different from each other and from the Microsoft implementations.

// Copyright (C) 2004-2005 Novell, Inc (http://www.novell.com)

public virtual Object this[Object key]
{
     get
     {
         if (key == null)
             throw new ArgumentNullException("key", "null key");

         Slot[] table = this.table;
         int[] hashes = this.hashes;
         uint size = (uint)table.Length;
         int h = this.GetHash(key) & Int32.MaxValue;
         uint indx = (uint)h;
         uint step = (uint)((h >> 5) + 1) % (size - 1) + 1;

         for (uint i = size; i > 0; i--)
         {
             indx %= size;
             Slot entry = table[indx];
             int hashMix = hashes[indx];
             Object k = entry.key;
             if (k == null)
                 break;

             if (k == key || ((hashMix & Int32.MaxValue) == h
                 && this.KeyEquals(key, k)))
             {
                 return entry.value;
             }

             if ((hashMix & CHAIN_MARKER) == 0)
                 break;

             indx += step;
         }

         return null;
     }

And their dictionary implementation.

// Copyright (C) 2004 Novell, Inc (http://www.novell.com)
// Copyright (C) 2005 David Waite
// Copyright (C) 2007 HotFeet GmbH (
http://www.hotfeet.ch)

public TValue this [TKey key] {
    get {
        if (key == null)
            throw new ArgumentNullException ("key");

        // get first item of linked list corresponding to given key
        int hashCode = hcp.GetHashCode (key);
        int cur = table [(hashCode & int.MaxValue) % table.Length] - 1;
        // walk linked list until right slot is found or end is reached
        while (cur != NO_SLOT) {
            // The ordering is important for compatibility with MS and strange
            // Object.Equals () implementations
            if (linkSlots [cur].HashCode == hashCode && hcp.Equals (keySlots [cur], key))
                return valueSlots [cur];
            cur = linkSlots [cur].Next;
        }
        throw new KeyNotFoundException ();
    }

 

Well there you have it; four ways to do essentially the same time. The one with the Sleep command is undoubtedly the worse, I'll it to you to decide which is the best.

Collections by Sam Allen Posted
  1. Back to top

    Collections

    by Sam Allen

    Very interesting, wish I knew more

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.