Search code examples
mysqldjangodjango-database

Django - There was a decent size cascade delete, now how do I fix this?


In my Django web application, I had a user delete a few records that seemed safe to delete. Turns out they were connected via ForeignKeyField fields to a fairly large set of other, highly important records. I now know that I can manage how deletes are handled, but this knowledge comes after this event occurred.

My main question is: is there an easy way to fix this or is it simply a matter of meticulously restoring each record, one by one, from backups?

More Details

Every night, I do a backup of the MySQL database using mysqldump. So I have a backup of all the data the day before this happened. The problem is that these backup files will restore a database in full. Given that we didn't notice the problem for a week or so, restoring the whole database I think is not an option since it would override other legitimate changes from the day the deletes happened to now.

I think my only option is to manually, one by one, pick out the records from the MySQL dump file, and manually INSERT them back into the MySQL DB. This feels like a bad idea since it's heavily prone to human error - my own typing.

Is this the only way or is there a better way?!


Solution

  • The approach I would take would be restore a backup to a different database. (The way we have our nightly mysqldump job configured, a separate dump for each database, and the SQL in the dump file doesn't contain any reference to the database name, so it's easy for us to create new database, e.g.

    CREATE DATABASE restore_YYYYMMDD_dbname ;
    

    And then run the gzipped mysqldump into the new "restore" database:

    gunzip -c dbname.backup_YYYMMDD.sql.gz | \
      mysql u root -pSECRET -c --database restore_YYYYMMDD_dbname 
    

    Obviously, we need sufficient disk space, and that will crank for however long it cranks.

    Then I can write SQL to discover the deleted rows. Since we have a unique id column as a PRIMARY KEY in almost every table, we just use an anti-join to find rows in the restored table which don't have a corresponding row in the current database table

    For example:

    SELECT r.*
      FROM restore_YYYYMMDD_dbname.mytable r
      LEFT
      JOIN dbname.mytable t
        ON t.id = r.id
     WHERE t.id IS NULL
    

    We may not want to restore every one of those rows, we can tweak the query to add some additional predicates to the WHERE clause to get it down to the rows we actually want. Then we can use that query as a source for an INSERT ... SELECT

    INSERT INTO dbname.mytable 
    SELECT r.*
      FROM ...
    

    We have to do each table in the right order, so we don't violate foreign key constraints. (We could use SET FOREIGN_KEY_CHECKS=0, if we're sure we know what we're doing; but it's safer just to do the operations in the right order.

    Finding "changed" rows is a little more complicated than the deleted rows, but we can do the same thing, writing queries to do that as well.


    The way we have our mysqldump process setup, it's pretty straightforward to do this. It's still a manual process, but we let SQL do a lot of the tedious work for us.

    If you've not tested restoring your database from a mysqldump to a different database, you may want to test that in a DIFFERENT environment first, so you don't inadvertently mess up the current database.