BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Deleting Data Is Not a Recommended Practice

Deleting Data Is Not a Recommended Practice

Leia em Português

This item in japanese

Bookmarks

Oren Eini, alias Ayende Rahien, encourages developers to avoid database soft deletes leaving the reader with the impression that hard deletes are an option. Reacting to Ayende’s blog post, Udi Dahan strongly advices to completely avoid data deletion.

The advocates of soft delete operations suggest adding an IsDeleted column to the table leaving the data intact. A row is considered as deleted if the IsDeleted flag is set. Ayende considers this approach as “simple, easy to understand, quick to implement and explain” but “quite often, wrong”. The problem is:

that deletion of a row or an entity is rarely a simple event. It effect not only the data in the model, but also the shape of the model. That is why we have foreign keys, to ensure that we don’t end up with Order Lines that don’t have a parent Order. And that is just the simplest of issues. …

When we are dealing with soft deletes, it is easy to get into situations where we have, for all intents and purposes, corrupt data, because Customer’s LastOrder (which is just a tiny optimization that no one thought about) now points to a soft deleted order.

If a developer is requested to delete data from the database, and if soft deletes are not recommended, then he is left with hard deletes. To preserve data integrity, he should delete the row and all related data in cascade. Udi Dahan draws the attention to the fact that the real world does not cascade:

Let’s say our marketing department decides to delete an item from the catalog. Should all previous orders containing that item just disappear? And cascading farther, should all invoices for those orders be deleted as well? Going on, would we have to redo the company’s profit and loss statements?

Heaven forbid.

 The problem seems to be with the interpretation of “delete”. Dahan gives the following example:

What I mean by ‘delete’ is that the product should be discontinued. We don’t want to sell this line of product anymore. We want to get rid of the inventory we have, but not order any more from our supplier. The product shouldn’t appear any more when customers do a product search or category listing, but the guys in the warehouse will still need to manage these items in the interim. It’s much shorter to just say ‘delete’ though.

He continues by giving a correct interpretation of user’s intention:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.

Instead of using the IsDeleted flag, Dahan suggests using a field containing the status of the related data: active, discontinued, cancelled, deprecated, etc. Such a status field allows the user to look into the past to trace old data and make decision based on it.

Deleting data may have negative consequences beside breaking data integrity. Dahan’s recommendation is to leave all data in the database: “Don’t delete. Just don’t.”

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Interesting ideas, but ...

    by Frank Smith,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    the thoughts are a bit muddled and oversimplified. Renaming the IsDeleted flag to something with more business meaning makes sense but it gives very little benefit in terms of history and auditing (as the article suggests) since only the last update is saved. It also doesn't address the issues raised by Ayende about the potential for corrupt data due to inconsistent "status fields" (whatever they might be called).

  • Accountants don't use erasers

    by Fabio Ferrari,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    This Pat Helland post explains that you can never really delete anything in the real world.
    You have to learn to live with all your past decisions...

  • Re: Interesting ideas, but ...

    by Luis Espinal,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    the thoughts are a bit muddled and oversimplified. Renaming the IsDeleted flag to something with more business meaning makes sense but it gives very little benefit in terms of history and auditing (as the article suggests) since only the last update is saved. It also doesn't address the issues raised by Ayende about the potential for corrupt data due to inconsistent "status fields" (whatever they might be called).


    I agree. To be honest, I don't understand why would anyone use "delete" in a discussion to refer to a terminal state in a business transaction or entity lifecycle. Cancelation, expiration, termination, completion, et al, these are not, and have never been a form of deletion.

    I very much agree with Ayende and Dahan, but I don't think their message is clear enough.

    First thing: A deletion is a deletion. We should never engage in calling these events or states as "delete". The language that Dahan chose to use at several points make things more complicated than they really are.

    OTH, he is right on the money on that data and tasks should model business specific statuses.

    In the defense of soft deletions: Assuming there is a legitimate need to perform a deletion (hard or soft) as an event distinct from a business specific event or status,
    there should not be complications. Just as actual deletions are cascaded and bounded by FK constrains, so can updates be. For those tables that require soft deletion, encapsulate the cascading soft-deletion in stored procedures. That is, for those cases, there should never be a case where application code (.ie. Java/C#/whatever) issues an SQL UPDATE statement directly. Inconvenient but necessary.

    Also, lets' consider Ayende's example, Customer's LastOrder pointing to a soft-deleted order. This is more of a business rule than anything else. Do the set of business rules allow the deletion (hard or soft) of an order at all? What do the set of business rules say about a Customer's LastOrder field in the event the order it refers to gets hard/soft deleted.

    Just as with hard deletions, soft deletions are subject to both data integrity and business requirements specific to a domain.

    This goes back again to Dahan's suggestion of using business statuses. I only had to work one time with soft deletes (in FoxPro) as a distinct event or status. Cascading had to be done programmatically (no triggers back then), but the model was comprehensible.

    but it gives very little benefit in terms of history and auditing (as the article suggests) since only the last update is saved.


    I agree Frank. I think it goes back to the terminology and samples of choice. By itself, soft-deletions would not help with history or auditing. One would have to couple that with replacing UPDATE operations with 1) an INSERT and then an UPDATE isDeleted=Y on the previous row (ugh!)

  • Not everything is an account or order

    by Ravi Ok,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Consider Emails. I get about 7000 emails per year. I delete 98% of them. If hotmail decides to keep them in my trash can forever, it makes no sense. But Udi might say, hey wait, what if a court wants hotmail to trace all emails for an account ? Aah thats where the backup comes into picture. So "delete" is specific to the context ? Obviously both Udi and Ayende know this, but try to over simplify the issue ?

  • Re: Interesting ideas, but ...

    by Bediako George,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    This could be handled by implementing change detection on all objects. Whatever the action, create, update, delete, store both the state of the object and the action performed on the object in a change event table. That way deleted information can always be retrieved.

    Admittedly, this would not address the foreign key scenario, but I believe that this would require the record to not even be deleted in the first place. If you want referential integrity, then partial removal of a related data set is not acceptable.

  • Agree

    by Ed Pichler,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I completly agree.

  • Soft delete and FKs dont go together

    by Rajiv Narula,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I don't have a problem with Soft Deletes and in fact like to use them often
    But what bothers me is usage of Foreign keys with soft deletes.
    If you mark a row as deleted- the FKs become ineffective

    The database thinks that the data has not been deleted and the application knows the data is not deleted. This will throw your data model for a toss.
    Not a practical problem in most case- unless you are a purist.

  • Re: Soft delete and FKs dont go together

    by Ashton Thorogood,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    The database is not the entire application. It is the repository of application information. "Soft deletes" are a database developer's way of implementing state transitions. These states should be implemented in the application, while the database is just recording the business activities. In other words, the application needs to ask questions correctly - not just arbitrarily join across foreign keys. It must expose the necessary interfaces with correct semantics. For example, there's a big difference between the application saying in effect: "Show me all orders" and "Show me all Open Orders for active products".

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT