I have a large table (> 50M rows) that I'm trying to convert into a new table format. As part of this, the row ids returned from the first insert need to be used in a second insert into a second table.
I am using a CTE that looks something like this:
WITH inserted_data AS (
INSERT INTO table1 (column1, column2, column3, column4)
SELECT value1, value2, value3, value4
FROM original_table
RETURNING rowid, column4
)
INSERT INTO table2 (table1_id, column4)
SELECT rowid, column4
FROM inserted_data;
The problem is, this is unacceptably slow. For only 4800 rows it takes 21 seconds, while for 9600 rows it takes around 50 seconds. At this rate I expect it would take around 3 days for 50 million records. I have a time constraint of around 2 hours.
If I perform just the first insert (no CTE), the query is 100x times faster, around 200 milliseconds and 500 milliseconds for 4800 and 9600 rows. I know that the time for the second insert would also be negligible. At this rate the query would finish in the allotted time.
The question is, how can I rewrite the query to perform at the rate the individual queries could complete in. Shuffling all the data out of the database into an external program would be a hassle and error prone as well as require extra resources. If I do something like write a function with a loop then I'm not inserting in one go and i expect poor performance there as well. I don't think using a temporary table will help, because the issue seems to be with the mere existence of the CTE.
I tried this:
INSERT INTO table2 (table1_id, column4)
SELECT rowid, column4
FROM (
WITH inserted_data AS (
INSERT INTO table1 (column1, column2, column3, column4)
SELECT value1, value2, value3, value4
FROM original_table
RETURNING rowid, column4
)
)
But this gives me:
syntax error at or near "INTO"
How about separating the inserts in two statements? The second insert
could look like:
INSERT INTO table2
(table1_id, column4)
SELECT t1.rowid
, ot.column4
FROM original_table ot
JOIN table1 t1
ON t1.column1 = ot.value1
AND t1.column2 = ot.value2
AND t1.column3 = ot.value3
AND t1.column4 = ot.value4
For each row in original_table
, the join conditions looks for the row inserted into table1
.