I know there are a lot related questions with many answers, but I have a bit of a more nuanced question. I have been doing reading on different insert techniques for mass records, but are there limits on how big a query insert can be? Can the same technique be used for REPLACE
and INSERT ...ON DUPLICATE KEY UPDATE ...
? Is there a faster method?
Table:
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| a | int(11) | NO | PRI | NULL | auto_increment |
| b | int(11) | YES | | NULL | |
| c | int(11) | YES | | NULL | |
#1
1) "INSERT INTO TABLE COLUMNS (a,b,c) values (1,2,3);"
2) "INSERT INTO TABLE COLUMNS (a,b,c) values (5,6,7);"
3) "INSERT INTO TABLE COLUMNS (a,b,c) values (8,9,10);"
...
10,000) "INSERT INTO TABLE COLUMNS (a,b,c) values (30001,30002,30003);"
or
#2 - should be faster, but is there a limit?
"INSERT INTO TABLE COLUMNS (a,b,c) values (1,2,3),(4,5,6),(8,9,10)....(30001,30002,30003)" ;
From a scripting perspective (PHP), using #2, is it better to loop through and queue up 100 entries (1000 times)...or a 1000 entries (100 times), or just all 10,000 at once? Could this be done with 100,000 entries?
Can the same be used with REPLACE
:
"REPLACE INTO TABLE (a, b, c) VALUES(1,2,3),(4,5,6)(7,8,9),...(30001,30002,30003);"
Can it also be used with INSERT ON DUPLICATE
?
INSERT INTO TABLE (a, b, c) VALUES(1,2,3),(4,5,6),(7,8,9),....(30001,30002,30003) ON DUPLICATE KEY UPDATE (b=2,c=3)(b=5,c=6),(b=8,c=9),....(b=30002,c=30003) ?
For any and all of the above (assuming the replace/on duplicate are valid), are there faster methods to achieve the inserts?
#1 (single-row inserts) -- Slow. A variant is INSERT IGNORE
-- beware: it burns AUTO_INCREMENT
ids.
#2 (batch insert) -- Faster than #1 by a factor of 10. But do the inserts in batches of no more than 1000. (After that, you are into "diminishing returns" and may conflict with other activities.
#3 REPLACE
-- Bad. It is essentially a DELETE
plus an INSERT
. Once IODKU was added to MySQL, I don't think there is any use for REPLACE
. All the old AUTO_INCREMENT
ids will be tossed and new ones created.
#4 IODKU (Upsert) -- [If you need to test before Insert.] It can be batched, but not the way you presented it. (There is no need to repeat the b
and c
values.)
INSERT INTO (
INSERT INTO TABLE (a, b, c)
VALUES(1,2,3),(4,5,6),(7,8,9),....(30001,30002,30003)
ON DUPLICATE KEY UPDATE
b = VALUES(b),
c = VALUES(c);
Or, in MySQL 8.0, the last 2 lines are:
b = NEW.b,
c = NEW.c;
IODKU also burns ids.
MySQL LOAD DATA INFILE with ON DUPLICATE KEY UPDATE discusses a 2-step process of LOAD
+ IODKU. Depending on how complex the "updates" are, 2+ steps may be your best answer.
#5 LOAD DATA
-- as Bill mentions, this is a good way if the data comes from a file. (I am dubious about its speed if you also have to write the data to a file first.) Be aware of the usefulness of @variables to make minor tweaks as you do the load. (Eg, STR_TO_DATE(..)
to fix a DATE
format.)
#6 INSERT ... SELECT ...;
-- If the data is already in some other table(s), you may as well combine the Insert and Select. This works for IODKU, too.
As a side note, if you need to get AUTO_INCREMENT
ids of each batched row, I recommend some variant on the following. It is aimed at batch-normalization of id-name pairs that might already exist in the mapping table. Normalization