Column-based Storage in SQL Server 2011

| by Jonathan Allen on Mar 07, 2011. Estimated reading time: 2 minutes |

Like most major databases, SQL Server will normally store a table as B-tree if they have a clustered index, otherwise it uses Heap. Both methods are essentially row based, where the number of rows per page varies depending on the overall row size. Starting with SQL Server 2011 a third option becomes available. By applying a “Columnstore Index”, SQL Server will store data in terms of columns instead of rows.

When using a table with 1 TB of data and 1.44 billion rows, Microsoft claims that column-orientated queries saw a 16X speed-up in CPU time and a whopping 455X improvement in elapsed time. In real terms this means a query that took 501 seconds originally was reduced to merely 1.1 seconds. This test was performed on a 32-logical processor machine with 256GB of RAM.

This amazing improvement is gained by isolating each column to its own set of pages. When a query is performed only the columns in the result set are loaded from disk. The pages containing the other columns are simply ignored.

This is very much like having a covering index for every conceivable combination of columns. But instead of costing huge amounts of hard drive space, it actually takes less than a traditional table. Since SQL Server compression occurs at the page level, and a column is more likely to have repeating data than a row, tables with columnstore indexes are expected to higher compression levels.

The decision to use a columnstore index cannot be taken lightly. First and foremost, they are not updatable. Once the columnstore index has been created no new inserts, updates, or deletes are allowed against the table. Microsoft expects most shops to use a daily refresh cycle and otherwise treat the data as read-only. During the refresh cycle the index is dropped, data is updated, and then the index is reapplied. As this is certainly an expensive operation, one could use vertical partitioning to limit the churn to a subset of the logical table.

The use of columnstore indexes can also result in performance degradation. If you are working with most or all of the columns, recombining the rows can be quite expensive. This means OLTP-style queries should be avoided in favor of OLAP-style queries. Or in other words, if you find yourself writing “SELECT *” or pulling back one row at a time then columnstore isn’t appropriate for you.

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Column-based Storage - Great news but old news by Michael Kramer

Column-base storage, create only, no insert/update/delete? I'm thinking lite edition of Sybase IQ.
But it is certainly a great value-add for MSSQL anyway.

This is great by N T

Microsoft in 2011 will be able to do what Google could do in 1997!

Re: This is great by Francois Ward

Too bad Google didn't sell their engine to be used for any purpose to the public...

Re: This is great by The Dood

Hadoop, and there's plenty others.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

4 Discuss