I need to upsert (INSERT ... ON CONFLICT DO UPDATE
) multiple rows at once into a postgreSQL database using psycopg2. Essentially, I have a list of tuples representing "rows", and I need to insert them into the database, or update the database if there is a conflict. I need (possibly) every column to be updated (if not inserted), along with every row.
I've tried two main approaches, using psycopg2's cursor.execute()
function and execute_many()
function. First, I did the following:
upsert_statement = 'INSERT INTO table (col1, col2, col3) VALUES %s ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = ROW (excluded.*) WHERE table IS DISTINCT FROM excluded'
psycopg2.extras.execute_values(cursor, upsert_statement, values)
I create an SQL statement that inserts the values using execute_many()
(where values
passed to it is a list of tuples), and on a conflict the column values should be updated to excluded. However, I get the error SyntaxError: number of columns does not match number of values
sometimes, even though I know for a fact that the number of columns and values are the same.
So, I tried using only execute()
:
upsert_statement = f'INSERT INTO table (col1, col2, col3) VALUES (value1, value2, value3), (value4, value5, value6)... ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)...'
cursor.execute(upsert_statement)
Here, I do the batch upsert as part of the SQL, and so don't have to use execute_values()
. However, I get a SyntaxError
after the DO UPDATE SET
, because I don't think it's valid to have (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)...
.
What am I doing wrong? How can I bulk upsert multiple rows using psycopg2?
(I should note that in reality, (col1, col2, col3)
and (value1, value2, value3)
are dynamic, and change frequently)
You need to use table EXCLUDED
instead of value literals in your ON CONFLICT
statement:
The
SET
andWHERE
clauses inON CONFLICT DO UPDATE
have access to the existing row using the table's name (or an alias), and to the row proposed for insertion using the specialexcluded
table.
You also don't need to re-set the conflicting values, only the rest.
INSERT INTO table (col1, col2, col3)
VALUES
(value1, value2, value3),
(value4, value5, value6)
ON CONFLICT (col1) DO UPDATE
SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3);
For readability, you can format your in-line SQLs if you triple-quote your f-strings. I'm not sure if and which IDEs can detect it's an in-line SQL in Python and switch syntax highlighting, but I find indentation helpful enough.
upsert_statement = f"""
INSERT INTO table (col1, col2, col3)
VALUES
({value1}, {value2}, {value3}),
({value4}, {value5}, {value6})
ON CONFLICT (col1) DO UPDATE
SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3)"""
Here's a test at db<>fiddle:
drop table if exists test_70066823 cascade;
create table test_70066823 (
id integer primary key,
text_column_1 text,
text_column_2 text);
insert into test_70066823 values
(1,'first','first')
,(2,'second','second') returning *;
id | text_column_1 | text_column_2 |
---|---|---|
1 | first | first |
2 | second | second |
insert into test_70066823
values (1, 'third','first'),
(3, 'fourth','third'),
(4, 'fifth','fourth'),
(2, 'sixth','second')
on conflict (id) do update
set text_column_1=EXCLUDED.text_column_1,
text_column_2=EXCLUDED.text_column_2
returning *;
id | text_column_1 | text_column_2 |
---|---|---|
1 | third | first |
3 | fourth | third |
4 | fifth | fourth |
2 | sixth | second |
You can refer to this for improved insert performance. Inserts with a simple string-based execute
or execute_many
are the top 2 slowest approaches mentioned there.