Search code examples
pythonsqlpostgresqlbulkupdate

Bulk update Postgres column from python dataframe


I am using the below python code to update postgres DB column valuebased on Id. This loop has to run for thousands of records and it is taking longer time.

Is there a way where I can pass array of dataframe values instead of looping each row?

 for i in range(0,len(df)):
        QUERY=""" UPDATE "Table" SET "value"='%s' WHERE "Table"."id"='%s'
            """ % (df['value'][i], df['id'][i])
        cur.execute(QUERY)
        conn.commit()

Solution

  • Depends on a library you use to communicate with PostgreSQL, but usually bulk inserts are much faster via COPY FROM command.

    If you use psycopg2 it is as simple as following:

    cursor.copy_from(io.StringIO(string_variable), "destination_table", columns=('id', 'value'))
    

    Where string_variable is tab and new line delimited dataset like 1\tvalue1\n2\tvalue2\n.

    To achieve a performant bulk update I would do:

    1. Create a temporary table: CREATE TEMPORARY TABLE tmp_table;;

    2. Insert records with copy_from;

    3. Just update destination table with query UPDATE destination_table SET value = t.value FROM tmp_table t WHERE id = t.id or any other preferred syntax