Search code examples
user-accounts

Best practices for data deletion on user account termination


On a site that has a fair share of user-generated content such as forum threads, blog comments, submitted articles, private and public messaging, user profiles, etc; what is the best practice as far as what to do with the user-generated data if a user terminates their account?

I'm not asking legal advice and I don't view this as a legal question so much as a question of striking a balance between the user, other users, and the site because terms of use can be drawn up after that balance is struck. Some of the following scenarios should be considered when a user deletes their account:

  • Private messages between users - Should the conversation trail be deleted? If so, how do you account for cases of harassment where legal evidence is needed?
  • Forum questions or answers - If the user asked a question, should the entire thread be deleted? If they answer a question, should the answer be deleted?

I'm asking this question as I'm implementing user accounts into a CMS. I know that Facebook recently ran into trouble with their changes in their terms of use, but how do you balance a desire to delete with the needs and investment of the other users who also participated?


Solution

  • Generally speaking with databases you rarely delete anything. You can mark it as deleted but generally speaking you keep it in your database at least for a time.

    There are many reasons for this. Some of them are legal. You may have requirements ot keep data for a given period. Some of them are technical. Sometimes its just a safeguard. You may need to restore the information. The user may request their account is reopened or it may have been locked due to spamming but that was because the account had been compromised and has now been restored.

    Old data may be deleted or archived but this may take months or even years.

    Personally I just give relevant data a status column (eg 1 = active, 0 = deleted) and then just change the status rather than delete it 99% of the time.

    Data integrity is another issue here. Let me give you an example.

    Assume you have two entities:

    User: id, nick, name, email
    Message: id, sender_id, receiver_id, subject, body
    

    You want to delete a particular User. What do you do about messages they've sent and received? Those messages will appear in someone else's inbox or sent items so you can't delete them. Do you set the relevant field in Message to NULL? That doesn't make a lot of sense either because that message did come from (or go to) somebody, even if they aren't active anymore.

    You're better off just marking that user as deleted and keeping them around. It makes this and similar situations much easier to deal with.

    You also mention forum threads and so on. You can't delete those either (unless there are other reasons to do so such as spam or abuse) because they're content that is related to other content (eg forum messages that have been replied to).

    The only data you can safely and reasonably delete is child data. This is really the difference between aggregation and composition. The User and message relationship above is aggregation. An example of composition is House and Room. You delete a House and all the rooms go to. Rooms cannot exist without a House. This is composition or, in entity relationship terms, a parent-child relationship.

    But you'll find more instances of aggregation than composition (in my experience) so the question becomes: what do you do with that data? It's really hard to erase all traces of someone without deleting things you shouldn't. Just mark them as deleted, locked or inactive and deal with it that way.