Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Column-based Storage in SQL Server 2011

Column-based Storage in SQL Server 2011

This item in japanese

Like most major databases, SQL Server will normally store a table as B-tree if they have a clustered index, otherwise it uses Heap. Both methods are essentially row based, where the number of rows per page varies depending on the overall row size. Starting with SQL Server 2011 a third option becomes available. By applying a “Columnstore Index”, SQL Server will store data in terms of columns instead of rows.

When using a table with 1 TB of data and 1.44 billion rows, Microsoft claims that column-orientated queries saw a 16X speed-up in CPU time and a whopping 455X improvement in elapsed time. In real terms this means a query that took 501 seconds originally was reduced to merely 1.1 seconds. This test was performed on a 32-logical processor machine with 256GB of RAM.

This amazing improvement is gained by isolating each column to its own set of pages. When a query is performed only the columns in the result set are loaded from disk. The pages containing the other columns are simply ignored.

This is very much like having a covering index for every conceivable combination of columns. But instead of costing huge amounts of hard drive space, it actually takes less than a traditional table. Since SQL Server compression occurs at the page level, and a column is more likely to have repeating data than a row, tables with columnstore indexes are expected to higher compression levels.

The decision to use a columnstore index cannot be taken lightly. First and foremost, they are not updatable. Once the columnstore index has been created no new inserts, updates, or deletes are allowed against the table. Microsoft expects most shops to use a daily refresh cycle and otherwise treat the data as read-only. During the refresh cycle the index is dropped, data is updated, and then the index is reapplied. As this is certainly an expensive operation, one could use vertical partitioning to limit the churn to a subset of the logical table.

The use of columnstore indexes can also result in performance degradation. If you are working with most or all of the columns, recombining the rows can be quite expensive. This means OLTP-style queries should be avoided in favor of OLAP-style queries. Or in other words, if you find yourself writing “SELECT *” or pulling back one row at a time then columnstore isn’t appropriate for you.

Rate this Article