Search code examples
phpmysqlinnodb

Slow insert & update into large table


So I have written a script which import data from a csv file with more than 1.5m rows into a mysql innodb table, it will scan the CSV, then for each CSV row:

  • Search in the table to check if the record already exist.
  • If exists, update the record.
  • If not, insert new.

Then go to next row in CSV and do the above steps again.

The problem is, the table which data will be imported to also has 1.5m records, which make it too slow to import data. I tried the script and see that:

  • If import data to an empty table, it can import 500 rows per minute.
  • If import data to that 1.5m records table, it can only import 30 rows per minute.

Is there any solution to make the import progress faster?


Solution

  • The root of the problem is this:

    You search the table for the key you're about to insert, however when you perform an insert the database management system will also perform integrity checks, i.e. ensure there's no NULL values on columns marked as NOT NULL, check if foreign key constraints are not violated, and (most important in this case), ensure primary keys and unique fields are unique.

    This means that the DBMS will (among other things) search for the item when you perform the INSERT making your earlier search redundant.

    You can replace the 3 operations with a single one which would look something like:

    INSERT INTO tablename(column1,column2...) 
    VALUES (v1,v2,....) 
    ON DUPLICATE KEY UPDATE SET column1=v1...
    

    Also useful is to know that if you say column1=VALUES(column1) this means "use the value that I've specified in the VALUES() part of the insert.