A couple of weeks ago Pat Helland blogged about a short presentation he gave named "Normalization is for Sissies". Pat started by stating (the obvious) that we normalize to eliminate update anomalies. but then proceeded to make a couple interesting observations:
- Many types of computing problems are append only in nature or in his words "Accountant don't use erasers". Hence there is no need to normalize immutable data (unless you really need the space)
- Developers work with "complete" business objects. These Business Objects are denormalized complete views anyway
Andres Aguiar commented that in addition to the "no need to normalize immutable data", you don't even need to delete:
Another [idea] is that you actually don't need to delete/update the database. 'Deleting' a row means setting a InvalidationTimestamp = now(), and updating a row means setting InvalidationTimestamp = now() and inserting a new row with SinceTimestamp = now() and InvalidationTimestamp = null (you actually need two sets of dates, but that's for another post). Now, if you put the two ideas together, all the data is immutable, so you don't need to normalize anything.Analyzing this, one barrier Andres mentioned was that databases are not ready for the large number of columns per table such a scenario will generate, which is solvable if you apply archiving to older data.
Dare Obasanjo thinks that
"Database denormalization is the kind of performance optimization that should be carried out as a last resort after trying things like creating database indexes, using SQL views and implementing application specific in-memory caching. "However Dare agrees that when you need massive scale you probably need to denormalize and give Flickr as an example.
* Normalised data is for sissiesA 2004 blog post by Jason Kottke, which bares the same title like Pat's post and Cal's first point discusses Cal's presentations. Jason explains that in Flickr's case there is a ration of 13 Selects to each Insert/Delete or Update which is why they chose to denormalize.
* Keep multiple copies of data around
* Makes searching faster
* Have to ensure consistency in the application logic
To sum up as Pat finished his presentation:
"people normalize 'cuz their professor said to"Is it time yet to rethink the mantra of database normalization? Some of the larger web-sites seem to think so, what's your take?