Like most major databases, SQL Server will normally store a table as B-tree if they have a clustered index, otherwise it uses Heap. Both methods are essentially row based, where the number of rows per page varies depending on the overall row size. Starting with SQL Server 2011 a third option becomes available. By applying a “Columnstore Index”, SQL Server will store data in terms of columns instead of rows.
When using a table with 1 TB of data and 1.44 billion rows, Microsoft claims that column-orientated queries saw a 16X speed-up in CPU time and a whopping 455X improvement in elapsed time. In real terms this means a query that took 501 seconds originally was reduced to merely 1.1 seconds. This test was performed on a 32-logical processor machine with 256GB of RAM.
This amazing improvement is gained by isolating each column to its own set of pages. When a query is performed only the columns in the result set are loaded from disk. The pages containing the other columns are simply ignored.
This is very much like having a covering index for every conceivable combination of columns. But instead of costing huge amounts of hard drive space, it actually takes less than a traditional table. Since SQL Server compression occurs at the page level, and a column is more likely to have repeating data than a row, tables with columnstore indexes are expected to higher compression levels.
The decision to use a columnstore index cannot be taken lightly. First and foremost, they are not updatable. Once the columnstore index has been created no new inserts, updates, or deletes are allowed against the table. Microsoft expects most shops to use a daily refresh cycle and otherwise treat the data as read-only. During the refresh cycle the index is dropped, data is updated, and then the index is reapplied. As this is certainly an expensive operation, one could use vertical partitioning to limit the churn to a subset of the logical table.
The use of columnstore indexes can also result in performance degradation. If you are working with most or all of the columns, recombining the rows can be quite expensive. This means OLTP-style queries should be avoided in favor of OLAP-style queries. Or in other words, if you find yourself writing “SELECT *” or pulling back one row at a time then columnstore isn’t appropriate for you.
Community comments
Column-based Storage - Great news but old news
by Michael Kramer,
This is great
by N T,
Re: This is great
by Francois Ward,
Re: This is great
by The Dood,
Column-based Storage - Great news but old news
by Michael Kramer,
Your message is awaiting moderation. Thank you for participating in the discussion.
Column-base storage, create only, no insert/update/delete? I'm thinking lite edition of Sybase IQ.
But it is certainly a great value-add for MSSQL anyway.
This is great
by N T,
Your message is awaiting moderation. Thank you for participating in the discussion.
Microsoft in 2011 will be able to do what Google could do in 1997!
Re: This is great
by Francois Ward,
Your message is awaiting moderation. Thank you for participating in the discussion.
Too bad Google didn't sell their engine to be used for any purpose to the public...
Re: This is great
by The Dood,
Your message is awaiting moderation. Thank you for participating in the discussion.
Hadoop, and there's plenty others.