InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Sparse Columns, Filtered Indexes, and Column Sets

Posted by Jonathan Allen on Aug 26, 2008

Sections
Development,
Operations & Infrastructure
Topics
SQL Server ,
.NET
Tags
SQL Server 2008

Sparse Columns, Filtered Indexes, and Column Sets are new features for SQL Server 2008. While they allow you to break the 1024 limit on columns and save significant disk space, if not used correctly they can be a disaster.

As the name implies, Spare Columns are designed to be used with columns that are usually null. While the space savings they offer are impressive, they are only gained when the columns abide by certain constraints.

The first rule for sparse columns is that they really must be sparse. When null, the data point takes up no space just as if the column didn’t exist. But place any value in it and it will take 4 more bytes than usual. This even applies to a bit column, which grows from 0.125 to 4.125 bytes. This puts the critical point for bit columns at 98% null. For larger fields the gains are seen much earlier, for example the critical point for datetime columns is only 52% of the rows. In these examples, the critical point is where you see at least a 40% savings in space. SQL Server Books Online has a chart for sparse columns showing each data type and when you should consider marking a column sparse.

The second rule is that indexes must be written with sparse columns in mind. While a normal index can be used, it wastes a lot of space for nulls you are probably not going to query against. The solution is another new feature for SQL Server called Filtered Indexes. A filtered index has a where clause that prevents it from indexing rows that don't meet certain criteria. In the case of sparse columns, that criteria will almost always be where "column_name IS NOT NULL".

Sparse columns are also going to be slower than a normal column, so consider avoiding them in performance sensitive queries where CPU, rather than I/O, is the bounding constraint.

As an alternative to using sparse column, you can apply a filtered index to a normal column. This can make the indexes significantly smaller while avoiding the limitations on sparse columns. And as mentioned before, you can filter on more than just whether or not the column is null.

If you want to break the 1024 column limit, you need to turn to Column Sets. A column set allows you to bundle up all the extra columns into a single XML column for query purposes. As long as you don't try to return more than 1024 columns from a single query, you can still access individual columns normally.

According to Qingsong Yao, Microsoft needed to preserve the 1024 limit for clients,

In order to create more than 1024 columns, you have to have a columnset column defined in the table. We explict request this because the client can not retrieve more than 1024 columns, and we do not want people facing the issue of not being to retrieving data. Once the table has columnset defined, the select * will hide all sparse columns, and only see the column set. Neanwhile, people can still select individual sparse columns in their query.

Column sets have to part of a table's original design. If a table has any sparse columns in it, a column set cannot be added. Though once created, newly added sparse columns will be automatically added to the column set.

Since column sets appear as XML, care must be taken to not change XML in such a way that it cannot be mapped back to the underlying columns.

No comments

Watch Thread Reply

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.