Whitepaper Released: Sharding with SQL Azure
Yesterday Microsoft released a new whitepaper providing guidance on sharding with SQL Azure written by Michael Heydt and Michael Thomassy. As SQL Azure currently has a limit of 50GB per instance to scale horizontally to larger sizes, one must employ this technique of horizontal partitioning to achieve application scale-out. The intent of the whitepaper is to deliver guidance on how to architect an application that requires elasticity and fluidity of resources at the data layer over time.
The whitepaper provides:
- basic concepts in horizontal partitioning / sharding
- an overview of patterns and best practices
- challenges which may present themselves
- high-level design of an ADO.NET sharding library
- an introduction to SQL Azure Federations
While horizontal partitioning splits one or more tables by row, it is usually within the same database instance. The advantage achieved is reduced index size which, in theory, provides faster retrieval rates for data. In contrast, sharding tackles the same problem by splitting the table across multiple instances of the database which would typically reside on separate hardware requiring some form of notification and replication to provide synchronization between the tables.
In the Microsoft sharding pattern a “sharding key” is used to map data to specific shards which is the primary key in one of the data entities. Related data entities are clustered into a related set based upon the shared shard key and this unit is referred to as an atomic unit. All records in an atomic unit are stored in the same shard. Additionally, the process of rebalancing shards should be an offline process due to key rebalancing as the physical infrastructure is modified.
Microsoft will release SQL Azure Federations which will support sharding at the database level in 2011. At this time all sharding capabilities must be implemented at the application level using ADO.NET. This is in contrast to current “NoSQL” alternatives like MongoDB, CouchDB, SimpleDB which support sharding already.