InfoQ

News

New Compression Features in SQL Server 2008

Posted by Jonathan Allen on Nov 15, 2007

Community
.NET
Topics
Data Warehousing ,
Database Design
Tags
SQL Server 2008

The SQL Server compression story actually starts with SQL Server 2005. The Enterprise and Developer editions added a new storage format called vardecimal. This table-level option affects decimal and numeric fields. Values that require a lower precision than the field allows, for example storing 1.5 in a decimal(18,9), the storage needs are correspondingly reduced. In effect, it is a numeric version of varchar.

SQL Server 2008 goes far beyond this comparably simple trick. Chad Boyd writes,

However, Sql Server 2008's data compression is vastly different from this (though it still supports/includes the vardecimal format as well) - so different in fact, that if you enable data compression on a given table/index, the underlying row/page format is different - yes, that's right, you heard correctly - if you use compression (ROW or PAGE), the row/page format for Sql 2008 will be DIFFERENT from the existing row/page format (only for the table/index(es) you are using compression for). So, in Sql 2008, there are 2, yes 2, row/page formats for data. You may now be wondering "well, if the row/page format changed, how in the world did you have enough time to re-engineer every component in Sql Server that is aware of the format in such a small amount of time? The answer is that we didn't - the Storage Engine is the ONLY component in Sql 2008 that is aware of the new row/page format.

Row level compression drastically reduces the meta-data needed for variable length columns. Instead of storing the length of each field in 2 bytes, it now takes a mere 3 bits. Fields themselves are also smaller. Storing a 1 in an int field now only takes a single byte, though of course larger values may use up to 4 bytes.

Page level compression allows common data to be shared across rows. The first two techniques that Chad discusses are column prefix and page dictionary.

Assume you have a column of data on a single page of rows that contain values like 'Chad', 'Chadwick', 'Chadly', 'Chad', 'Chadster', 'Chadwick', and 'Chadly' (values repeated purposely) - as you can tell, there's quite a bit of redundant data 'prefixing' each of the rows in this column on this page, yes? So, what you might end up with in a scenario like this would be a column prefix value of 'Chad' stored in the CI structure, and each column ending up with pointers to that prefix value - resulting in values like this on-disk: '', '1wick', '1ly', '', '1ster', '1wick', and '1ly'

So, using the same column value samples as above with the Chad's, after the "column prefix" values are calculated and stored as mentioned above, you'd potentially end up with a page dictionary that contained the values '1ly', and '1wick', then the in-line row-values would ultimately look something like '', '2', '3', '', '1ster', '3', and '2'. In this case, we went from something in the lines of our original ~25 bytes of storage to around ~17 bytes of storage - another ~30+% savings.

Each page is compressed separately, with the prefixes and dictionaries stored in the page itself. Compression does not occur until the page is nearly full, as pages are allocated atomically and compressing a half a page to a quarter page wouldn't gain anything.

There are performance trade-offs for using row and page compression. CPU utilization will go up, but I/O utilization and memory utilization will go down.

Backup Compression is another feature for 2008. This is done using normal file system-style compression techniques. There is no tuning options, backup compression is either on or off for a given database.

All of the compression options are most likely going to be an enterprise-only option, though non-enterprise servers can restore a compressed backup.

50 New features of SQL Server 2008 by Dutt Dutt Posted Nov 6, 2008 4:59 AM
  1. Back to top

    50 New features of SQL Server 2008

    Nov 6, 2008 4:59 AM by Dutt Dutt

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.