Partitioning and Sharding Options for SQL Server and SQL Azure
Horizontal partitioning is an important tool for developers working with extremely large datasets. While modern database servers like SQL Server can support terabytes of RAM and hundreds of logical processors, there is still a practical limit to the amount of data that can stored in a single table. This is where horizontal partitioning comes into play. Horizontal partitioning can be done both within a single server and across multiple servers, the latter often being referred to as sharding.
In SQL Server 2005, Microsoft added the ability to create up to 1,000 partitions per table. This form of partitioning occurs inside a single server and is used to spread a single logical table across multiple file groups. This improves I/O right away and with a well-designed schema it can also drastically improve other aspects of performance. Unfortunately there are numerous downsides to this feature. Since it is limited to a single machine you need a powerful database server and a storage array network for it to be effective. In addition to the hardware costs, the horizontal partitioning feature requires Enterprise or Datacenter licenses which have a retail price of 27,495 and 54,990 per processor.
Those interested in this can read the white paper Partitioned Table and Index Strategies Using SQL Server 2008. It is rather lengthy, but should be considered required reading by anyone who wants to use this feature in SQL Server. And of course comparable features can be found in DB2, Oracle, Sybase Adapter Server, and MySQL.
Fully partitioning data across multiple servers isn’t something that SQL Server does out of the box. While it is certainly possible to implement the necessary logic in stored procedures or service tier code, such ad-hoc methods tend to be unsatisfactory to developers who want to focus on the real requirements of their product. One can also use SQL Server’s Distributed Partitioned Views, but the rules are rather onerous. For example, one cannot use identity columns or timestamps for the partitioning column, but the partitioning column has to be part of the primary key.
SQL Azure promises to offer partitioning via what they call “federation”. At first glance this appears to be a vast improvement but Federation in SQL Azure is still a long way away from even a preview release.
In the meantime those wishing to stick with SQL Server can try turning to third party tools. One such option is the recently released Enzo SQL Shard library. Based on .NET’s Task Parallel Library, this open source project from Blue Syntax promises to offer sharding for SQL Server and SQL Azure.
Cost benefit of Sharding