InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Column-based Storage in SQL Server 2011

Posted by Jonathan Allen on Mar 07, 2011

Sections
Enterprise Architecture,
Operations & Infrastructure,
Architecture & Design,
Development
Topics
SQL Server 2011 ,
SQL Server ,
SQL Server 2012 ,
Microsoft ,
.NET ,
Data Access ,
Data Analysis ,
Relational Databases ,
Database Design ,
Database ,
Programming ,
Companies ,
Data Warehousing ,
OLAP ,
OLTP

Like most major databases, SQL Server will normally store a table as B-tree if they have a clustered index, otherwise it uses Heap. Both methods are essentially row based, where the number of rows per page varies depending on the overall row size. Starting with SQL Server 2011 a third option becomes available. By applying a “Columnstore Index”, SQL Server will store data in terms of columns instead of rows.

When using a table with 1 TB of data and 1.44 billion rows, Microsoft claims that column-orientated queries saw a 16X speed-up in CPU time and a whopping 455X improvement in elapsed time. In real terms this means a query that took 501 seconds originally was reduced to merely 1.1 seconds. This test was performed on a 32-logical processor machine with 256GB of RAM.

This amazing improvement is gained by isolating each column to its own set of pages. When a query is performed only the columns in the result set are loaded from disk. The pages containing the other columns are simply ignored.

This is very much like having a covering index for every conceivable combination of columns. But instead of costing huge amounts of hard drive space, it actually takes less than a traditional table. Since SQL Server compression occurs at the page level, and a column is more likely to have repeating data than a row, tables with columnstore indexes are expected to higher compression levels.

The decision to use a columnstore index cannot be taken lightly. First and foremost, they are not updatable. Once the columnstore index has been created no new inserts, updates, or deletes are allowed against the table. Microsoft expects most shops to use a daily refresh cycle and otherwise treat the data as read-only. During the refresh cycle the index is dropped, data is updated, and then the index is reapplied. As this is certainly an expensive operation, one could use vertical partitioning to limit the churn to a subset of the logical table.

The use of columnstore indexes can also result in performance degradation. If you are working with most or all of the columns, recombining the rows can be quite expensive. This means OLTP-style queries should be avoided in favor of OLAP-style queries. Or in other words, if you find yourself writing “SELECT *” or pulling back one row at a time then columnstore isn’t appropriate for you.

Column-based Storage - Great news but old news by Michael Kramer Posted
This is great by N T Posted
Re: This is great by Francois Ward Posted
Re: This is great by The Dood Posted
  1. Back to top

    Column-based Storage - Great news but old news

    by Michael Kramer

    Column-base storage, create only, no insert/update/delete? I'm thinking lite edition of Sybase IQ.
    But it is certainly a great value-add for MSSQL anyway.

  2. Back to top

    This is great

    by N T

    Microsoft in 2011 will be able to do what Google could do in 1997!

  3. Back to top

    Re: This is great

    by Francois Ward

    Too bad Google didn't sell their engine to be used for any purpose to the public...

  4. Back to top

    Re: This is great

    by The Dood

    Hadoop, and there's plenty others.