BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Partitioning and Sharding Options for SQL Server and SQL Azure

by Jonathan Allen on Feb 14, 2011 |

Horizontal partitioning is an important tool for developers working with extremely large datasets. While modern database servers like SQL Server can support terabytes of RAM and hundreds of logical processors, there is still a practical limit to the amount of data that can stored in a single table. This is where horizontal partitioning comes into play. Horizontal partitioning can be done both within a single server and across multiple servers, the latter often being referred to as sharding.

In SQL Server 2005, Microsoft added the ability to create up to 1,000 partitions per table. This form of partitioning occurs inside a single server and is used to spread a single logical table across multiple file groups. This improves I/O right away and with a well-designed schema it can also drastically improve other aspects of performance. Unfortunately there are numerous downsides to this feature. Since it is limited to a single machine you need a powerful database server and a storage array network for it to be effective. In addition to the hardware costs, the horizontal partitioning feature requires Enterprise or Datacenter licenses which have a retail price of 27,495 and 54,990 per processor.

Those interested in this can read the white paper Partitioned Table and Index Strategies Using SQL Server 2008. It is rather lengthy, but should be considered required reading by anyone who wants to use this feature in SQL Server. And of course comparable features can be found in DB2, Oracle, Sybase Adapter Server, and MySQL.

Fully partitioning data across multiple servers isn’t something that SQL Server does out of the box. While it is certainly possible to implement the necessary logic in stored procedures or service tier code, such ad-hoc methods tend to be unsatisfactory to developers who want to focus on the real requirements of their product. One can also use SQL Server’s Distributed Partitioned Views, but the rules are rather onerous. For example, one cannot use identity columns or timestamps for the partitioning column, but the partitioning column has to be part of the primary key.

SQL Azure promises to offer partitioning via what they call “federation”. At first glance this appears to be a vast improvement but Federation in SQL Azure is still a long way away from even a preview release.

In the meantime those wishing to stick with SQL Server can try turning to third party tools. One such option is the recently released Enzo SQL Shard library. Based on .NET’s Task Parallel Library, this open source project from Blue Syntax promises to offer sharding for SQL Server and SQL Azure.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Cost benefit of Sharding by Roopesh Shenoy

There is also another benefit of sharding, that can make all the effort worthwhile - dramatic cost reductions. This is especially true if there are predictable spikes in traffic (say selling tickets before a major event), where the data can be sharded over-night and then composed back.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT