Search code examples
mysqlforeign-keysrelational-databasereferential-integrity

mySQL repair new database integrity implementing foreign keys


My problem is fairly simple. I was been given around 95% of a fairly large, working data set. And I wrote out a parser, set my data out into CSVs, wrote my bulk loads, and have everything set up save the foreign keys.

Now, part of the lost 5% is from one table, (say Items) that used to contain information that another table, (say Users) does still have. And since referential integrity is broken between the tables, the foreign keys constraint can't be added, throwing an error 1452, as the latter table references a record missing in the former.

Now, I don't mind losing this 5% and want to have the 95%. Is there a simple way to just throw out the broken records on adding the foreign key? Or can I do this through some SQL query? Or do I have to go back and re-parse everything to throw out the bad records? I feel like the first or second should be an option. But I just can't find it, as I'm no DB admin... Am I missing something stupidly simple?


Solution

  • You could use INSERT IGNORE ... or if you use bulk data loads use LOAD DATA INFILE 'filename' IGNORE ... or if you use the mysqlimport program, there's a --ignore option.

    What this does is skips any individual rows if there is an error when trying to import them, and continue to the next rows.

    Another alternative is to use SET foreign_key_checks=0; before using INSERT or LOAD DATA INFILE. This will allow the rows to be loaded, even though they reference a value that doesn't exist. Of course this leads to another problem: you will have rows of data that are "orphans" because their parent is missing. These orphan rows tend to go undetected, because they naturally get omitted from joins.

    You can detect orphaned rows like this:

    SELECT Items.*
    FROM Items
    LEFT OUTER JOIN Users ON Items.user_id = Users.id
    WHERE Users.id IS NULL;
    

    This kind of query returns only the rows in Items where there is no matching row in Users, thus the outer join returns NULL for all columns of Users.*. You can then review those rows and decide what you want to do with those rows, such as UPDATE them to change the user_id to something else, or perhaps just DELETE the rows.