Sparse Columns, Filtered Indexes, and Column Sets

Sparse Columns, Filtered Indexes, and Column Sets are new features for SQL Server 2008. While they allow you to break the 1024 limit on columns and save significant disk space, if not used correctly they can be a disaster.

As the name implies, Spare Columns are designed to be used with columns that are usually null. While the space savings they offer are impressive, they are only gained when the columns abide by certain constraints.

The first rule for sparse columns is that they really must be sparse. When null, the data point takes up no space just as if the column didn’t exist. But place any value in it and it will take 4 more bytes than usual. This even applies to a bit column, which grows from 0.125 to 4.125 bytes. This puts the critical point for bit columns at 98% null. For larger fields the gains are seen much earlier, for example the critical point for datetime columns is only 52% of the rows. In these examples, the critical point is where you see at least a 40% savings in space. SQL Server Books Online has a chart for sparse columns showing each data type and when you should consider marking a column sparse.

The second rule is that indexes must be written with sparse columns in mind. While a normal index can be used, it wastes a lot of space for nulls you are probably not going to query against. The solution is another new feature for SQL Server called Filtered Indexes. A filtered index has a where clause that prevents it from indexing rows that don't meet certain criteria. In the case of sparse columns, that criteria will almost always be where "column_name IS NOT NULL".

Sparse columns are also going to be slower than a normal column, so consider avoiding them in performance sensitive queries where CPU, rather than I/O, is the bounding constraint.

As an alternative to using sparse column, you can apply a filtered index to a normal column. This can make the indexes significantly smaller while avoiding the limitations on sparse columns. And as mentioned before, you can filter on more than just whether or not the column is null.

If you want to break the 1024 column limit, you need to turn to Column Sets. A column set allows you to bundle up all the extra columns into a single XML column for query purposes. As long as you don't try to return more than 1024 columns from a single query, you can still access individual columns normally.

According to Qingsong Yao, Microsoft needed to preserve the 1024 limit for clients,

In order to create more than 1024 columns, you have to have a columnset column defined in the table. We explict request this because the client can not retrieve more than 1024 columns, and we do not want people facing the issue of not being to retrieving data. Once the table has columnset defined, the select * will hide all sparse columns, and only see the column set. Neanwhile, people can still select individual sparse columns in their query.

Column sets have to part of a table's original design. If a table has any sparse columns in it, a column set cannot be added. Though once created, newly added sparse columns will be automatically added to the column set.

Since column sets appear as XML, care must be taken to not change XML in such a way that it cannot be mapped back to the underlying columns.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Infrastructure topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter