Deleting Data Is Not a Recommended Practice
Oren Eini, alias Ayende Rahien, encourages developers to avoid database soft deletes leaving the reader with the impression that hard deletes are an option. Reacting to Ayende’s blog post, Udi Dahan strongly advices to completely avoid data deletion.
The advocates of soft delete operations suggest adding an IsDeleted column to the table leaving the data intact. A row is considered as deleted if the IsDeleted flag is set. Ayende considers this approach as “simple, easy to understand, quick to implement and explain” but “quite often, wrong”. The problem is:
that deletion of a row or an entity is rarely a simple event. It effect not only the data in the model, but also the shape of the model. That is why we have foreign keys, to ensure that we don’t end up with Order Lines that don’t have a parent Order. And that is just the simplest of issues. …
When we are dealing with soft deletes, it is easy to get into situations where we have, for all intents and purposes, corrupt data, because Customer’s LastOrder (which is just a tiny optimization that no one thought about) now points to a soft deleted order.
If a developer is requested to delete data from the database, and if soft deletes are not recommended, then he is left with hard deletes. To preserve data integrity, he should delete the row and all related data in cascade. Udi Dahan draws the attention to the fact that the real world does not cascade:
Let’s say our marketing department decides to delete an item from the catalog. Should all previous orders containing that item just disappear? And cascading farther, should all invoices for those orders be deleted as well? Going on, would we have to redo the company’s profit and loss statements?
The problem seems to be with the interpretation of “delete”. Dahan gives the following example:
What I mean by ‘delete’ is that the product should be discontinued. We don’t want to sell this line of product anymore. We want to get rid of the inventory we have, but not order any more from our supplier. The product shouldn’t appear any more when customers do a product search or category listing, but the guys in the warehouse will still need to manage these items in the interim. It’s much shorter to just say ‘delete’ though.
He continues by giving a correct interpretation of user’s intention:
Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.
Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.
Jobs aren’t deleted – they’re filled (or their requisition is revoked).
In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.
Instead of using the IsDeleted flag, Dahan suggests using a field containing the status of the related data: active, discontinued, cancelled, deprecated, etc. Such a status field allows the user to look into the past to trace old data and make decision based on it.
Deleting data may have negative consequences beside breaking data integrity. Dahan’s recommendation is to leave all data in the database: “Don’t delete. Just don’t.”
Interesting ideas, but ...
Accountants don't use erasers
You have to learn to live with all your past decisions...
Re: Interesting ideas, but ...
the thoughts are a bit muddled and oversimplified. Renaming the IsDeleted flag to something with more business meaning makes sense but it gives very little benefit in terms of history and auditing (as the article suggests) since only the last update is saved. It also doesn't address the issues raised by Ayende about the potential for corrupt data due to inconsistent "status fields" (whatever they might be called).
I agree. To be honest, I don't understand why would anyone use "delete" in a discussion to refer to a terminal state in a business transaction or entity lifecycle. Cancelation, expiration, termination, completion, et al, these are not, and have never been a form of deletion.
I very much agree with Ayende and Dahan, but I don't think their message is clear enough.
First thing: A deletion is a deletion. We should never engage in calling these events or states as "delete". The language that Dahan chose to use at several points make things more complicated than they really are.
OTH, he is right on the money on that data and tasks should model business specific statuses.
In the defense of soft deletions: Assuming there is a legitimate need to perform a deletion (hard or soft) as an event distinct from a business specific event or status,
there should not be complications. Just as actual deletions are cascaded and bounded by FK constrains, so can updates be. For those tables that require soft deletion, encapsulate the cascading soft-deletion in stored procedures. That is, for those cases, there should never be a case where application code (.ie. Java/C#/whatever) issues an SQL UPDATE statement directly. Inconvenient but necessary.
Also, lets' consider Ayende's example, Customer's LastOrder pointing to a soft-deleted order. This is more of a business rule than anything else. Do the set of business rules allow the deletion (hard or soft) of an order at all? What do the set of business rules say about a Customer's LastOrder field in the event the order it refers to gets hard/soft deleted.
Just as with hard deletions, soft deletions are subject to both data integrity and business requirements specific to a domain.
This goes back again to Dahan's suggestion of using business statuses. I only had to work one time with soft deletes (in FoxPro) as a distinct event or status. Cascading had to be done programmatically (no triggers back then), but the model was comprehensible.
but it gives very little benefit in terms of history and auditing (as the article suggests) since only the last update is saved.
I agree Frank. I think it goes back to the terminology and samples of choice. By itself, soft-deletions would not help with history or auditing. One would have to couple that with replacing UPDATE operations with 1) an INSERT and then an UPDATE isDeleted=Y on the previous row (ugh!)
Not everything is an account or order
Re: Interesting ideas, but ...
Admittedly, this would not address the foreign key scenario, but I believe that this would require the record to not even be deleted in the first place. If you want referential integrity, then partial removal of a related data set is not acceptable.
Soft delete and FKs dont go together
But what bothers me is usage of Foreign keys with soft deletes.
If you mark a row as deleted- the FKs become ineffective
The database thinks that the data has not been deleted and the application knows the data is not deleted. This will throw your data model for a toss.
Not a practical problem in most case- unless you are a purist.
Re: Soft delete and FKs dont go together