Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Deleting Data Is Not a Recommended Practice

Deleting Data Is Not a Recommended Practice

Leia em Português

This item in japanese

Oren Eini, alias Ayende Rahien, encourages developers to avoid database soft deletes leaving the reader with the impression that hard deletes are an option. Reacting to Ayende’s blog post, Udi Dahan strongly advices to completely avoid data deletion.

The advocates of soft delete operations suggest adding an IsDeleted column to the table leaving the data intact. A row is considered as deleted if the IsDeleted flag is set. Ayende considers this approach as “simple, easy to understand, quick to implement and explain” but “quite often, wrong”. The problem is:

that deletion of a row or an entity is rarely a simple event. It effect not only the data in the model, but also the shape of the model. That is why we have foreign keys, to ensure that we don’t end up with Order Lines that don’t have a parent Order. And that is just the simplest of issues. …

When we are dealing with soft deletes, it is easy to get into situations where we have, for all intents and purposes, corrupt data, because Customer’s LastOrder (which is just a tiny optimization that no one thought about) now points to a soft deleted order.

If a developer is requested to delete data from the database, and if soft deletes are not recommended, then he is left with hard deletes. To preserve data integrity, he should delete the row and all related data in cascade. Udi Dahan draws the attention to the fact that the real world does not cascade:

Let’s say our marketing department decides to delete an item from the catalog. Should all previous orders containing that item just disappear? And cascading farther, should all invoices for those orders be deleted as well? Going on, would we have to redo the company’s profit and loss statements?

Heaven forbid.

 The problem seems to be with the interpretation of “delete”. Dahan gives the following example:

What I mean by ‘delete’ is that the product should be discontinued. We don’t want to sell this line of product anymore. We want to get rid of the inventory we have, but not order any more from our supplier. The product shouldn’t appear any more when customers do a product search or category listing, but the guys in the warehouse will still need to manage these items in the interim. It’s much shorter to just say ‘delete’ though.

He continues by giving a correct interpretation of user’s intention:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.

Instead of using the IsDeleted flag, Dahan suggests using a field containing the status of the related data: active, discontinued, cancelled, deprecated, etc. Such a status field allows the user to look into the past to trace old data and make decision based on it.

Deleting data may have negative consequences beside breaking data integrity. Dahan’s recommendation is to leave all data in the database: “Don’t delete. Just don’t.”

Rate this Article