New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Jonathan Allen on Aug 26, 2008
Sparse Columns, Filtered Indexes, and Column Sets are new features for SQL Server 2008. While they allow you to break the 1024 limit on columns and save significant disk space, if not used correctly they can be a disaster.
As the name implies, Spare Columns are designed to be used with columns that are usually null. While the space savings they offer are impressive, they are only gained when the columns abide by certain constraints.
The first rule for sparse columns is that they really must be sparse. When null, the data point takes up no space just as if the column didn’t exist. But place any value in it and it will take 4 more bytes than usual. This even applies to a bit column, which grows from 0.125 to 4.125 bytes. This puts the critical point for bit columns at 98% null. For larger fields the gains are seen much earlier, for example the critical point for datetime columns is only 52% of the rows. In these examples, the critical point is where you see at least a 40% savings in space. SQL Server Books Online has a chart for sparse columns showing each data type and when you should consider marking a column sparse.
The second rule is that indexes must be written with sparse columns in mind. While a normal index can be used, it wastes a lot of space for nulls you are probably not going to query against. The solution is another new feature for SQL Server called Filtered Indexes. A filtered index has a where clause that prevents it from indexing rows that don't meet certain criteria. In the case of sparse columns, that criteria will almost always be where "column_name IS NOT NULL".
Sparse columns are also going to be slower than a normal column, so consider avoiding them in performance sensitive queries where CPU, rather than I/O, is the bounding constraint.
As an alternative to using sparse column, you can apply a filtered index to a normal column. This can make the indexes significantly smaller while avoiding the limitations on sparse columns. And as mentioned before, you can filter on more than just whether or not the column is null.
If you want to break the 1024 column limit, you need to turn to Column Sets. A column set allows you to bundle up all the extra columns into a single XML column for query purposes. As long as you don't try to return more than 1024 columns from a single query, you can still access individual columns normally.
According to Qingsong Yao, Microsoft needed to preserve the 1024 limit for clients,
In order to create more than 1024 columns, you have to have a columnset column defined in the table. We explict request this because the client can not retrieve more than 1024 columns, and we do not want people facing the issue of not being to retrieving data. Once the table has columnset defined, the select * will hide all sparse columns, and only see the column set. Neanwhile, people can still select individual sparse columns in their query.
Column sets have to part of a table's original design. If a table has any sparse columns in it, a column set cannot be added. Though once created, newly added sparse columns will be automatically added to the column set.
Since column sets appear as XML, care must be taken to not change XML in such a way that it cannot be mapped back to the underlying columns.
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Case Study: IBM's Agile Transformation
Agile Development: A Manager's Roadmap for Success
Monitor your Production Java App - includes JMX! Low Overhead - Free download
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
No comments
Watch Thread Reply