Search code examples
mysqlinsertsql-insertupsert

Mysql fastest technique for insert, replace, on duplicate of mass records


I know there are a lot related questions with many answers, but I have a bit of a more nuanced question. I have been doing reading on different insert techniques for mass records, but are there limits on how big a query insert can be? Can the same technique be used for REPLACE and INSERT ...ON DUPLICATE KEY UPDATE ... ? Is there a faster method?

Table:

+-----------+-------------+------+-----+---------+----------------+
| Field     | Type        | Null | Key | Default | Extra          |
+-----------+-------------+------+-----+---------+----------------+
| a         | int(11)     | NO   | PRI | NULL    | auto_increment |
| b         | int(11)     | YES  |     | NULL    |                |
| c         | int(11)     | YES  |     | NULL    |                |

#1

1) "INSERT INTO TABLE COLUMNS (a,b,c) values (1,2,3);"
2) "INSERT INTO TABLE COLUMNS (a,b,c) values (5,6,7);"
3) "INSERT INTO TABLE COLUMNS (a,b,c) values (8,9,10);"
 ...
10,000) "INSERT INTO TABLE COLUMNS (a,b,c) values (30001,30002,30003);"

or

#2 - should be faster, but is there a limit?

"INSERT INTO TABLE COLUMNS (a,b,c) values (1,2,3),(4,5,6),(8,9,10)....(30001,30002,30003)" ;

From a scripting perspective (PHP), using #2, is it better to loop through and queue up 100 entries (1000 times)...or a 1000 entries (100 times), or just all 10,000 at once? Could this be done with 100,000 entries?

Can the same be used with REPLACE:

"REPLACE INTO TABLE (a, b, c) VALUES(1,2,3),(4,5,6)(7,8,9),...(30001,30002,30003);"

Can it also be used with INSERT ON DUPLICATE?

INSERT INTO TABLE (a, b, c) VALUES(1,2,3),(4,5,6),(7,8,9),....(30001,30002,30003) ON DUPLICATE KEY UPDATE (b=2,c=3)(b=5,c=6),(b=8,c=9),....(b=30002,c=30003) ?

For any and all of the above (assuming the replace/on duplicate are valid), are there faster methods to achieve the inserts?


Solution

  • #1 (single-row inserts) -- Slow. A variant is INSERT IGNORE -- beware: it burns AUTO_INCREMENT ids.

    #2 (batch insert) -- Faster than #1 by a factor of 10. But do the inserts in batches of no more than 1000. (After that, you are into "diminishing returns" and may conflict with other activities.

    #3 REPLACE -- Bad. It is essentially a DELETE plus an INSERT. Once IODKU was added to MySQL, I don't think there is any use for REPLACE. All the old AUTO_INCREMENT ids will be tossed and new ones created.

    #4 IODKU (Upsert) -- [If you need to test before Insert.] It can be batched, but not the way you presented it. (There is no need to repeat the b and c values.)

    INSERT INTO (
    INSERT INTO TABLE (a, b, c)
         VALUES(1,2,3),(4,5,6),(7,8,9),....(30001,30002,30003)
         ON DUPLICATE KEY UPDATE
             b = VALUES(b),
             c = VALUES(c);
    

    Or, in MySQL 8.0, the last 2 lines are:

             b = NEW.b,
             c = NEW.c;
    

    IODKU also burns ids.

    MySQL LOAD DATA INFILE with ON DUPLICATE KEY UPDATE discusses a 2-step process of LOAD + IODKU. Depending on how complex the "updates" are, 2+ steps may be your best answer.

    #5 LOAD DATA -- as Bill mentions, this is a good way if the data comes from a file. (I am dubious about its speed if you also have to write the data to a file first.) Be aware of the usefulness of @variables to make minor tweaks as you do the load. (Eg, STR_TO_DATE(..) to fix a DATE format.)

    #6 INSERT ... SELECT ...; -- If the data is already in some other table(s), you may as well combine the Insert and Select. This works for IODKU, too.

    As a side note, if you need to get AUTO_INCREMENT ids of each batched row, I recommend some variant on the following. It is aimed at batch-normalization of id-name pairs that might already exist in the mapping table. Normalization