Search code examples
pythonpostgresqlsql-updatepsycopg2

psycopg2: update multiple rows with one query


I tried to update multiple rows (approx. 350000) with a single query by implementing the following function:

def update_items(rows_to_update):
    sql_query = """UPDATE contact as t SET
                    name = e.name
                    FROM (VALUES %s) AS e(id, name)
                    WHERE e.id = t.id;"""

    conn = get_db_connection()
    cur = conn.cursor()

    psycopg2.extras.execute_values (
    cur, sql_query, rows_to_update, template=None, page_size=100
    )

While trying to run the function above, only 31 records were updated. Then, I tried to update row by row with the following function:

def update_items_row_by_row(rows_to_update):
    sql_query = """UPDATE contact SET name = %s WHERE id = %s"""
    conn = get_db_connection()
    with tqdm(total=len(rows_to_update)) as pbar:
        for id, name in rows_to_update:
            cur = conn.cursor()
            # execute the UPDATE  statement
            cur.execute(sql_query, (name, id))
            # get the number of updated rows
            # Commit the changes to the database
            conn.commit()
            cur.close()
            pbar.update(1)

The latter has updated all the records so far but is very slow (estimated to end in 9 hours). Does anyone know what is the efficient way to update multiple records?


Solution

  • By splitting the list into chunks of size equal to page_size, it worked well:

    def update_items(rows_to_update):
        sql_query = """UPDATE contact as t SET
                        name = data.name
                        FROM (VALUES %s) AS data (id, name)
                        WHERE t.id = data.id"""
        conn = get_db_connection()
        cur = conn.cursor()
        n = 100
        with tqdm(total=len(rows_to_update)) as pbar:
            for i in range(0, len(rows_to_update), n):
                psycopg2.extras.execute_values (
                cur, sql_query, rows_to_update[i:i + n], template=None, page_size=n
                )
                conn.commit()
                pbar.update(cur.rowcount)
        cur.close()
        conn.close()