Search code examples
pythonpsycopg2

PostGres - Insert list of tuples using execute instead of executemany


I am inserting thousands of rows, timing and speed is very important. I have found through benchmarking that postgres can ingest my rows faster using execute() instead of executemany()

This works well for me:

...

def insert(self, table, columns, values):
    conn = self.connectionPool.getconn()
    conn.autocommit = True

    try:
        with conn.cursor() as cursor:
            query = (
                 f'INSERT INTO {table} ({columns}) '
                 f'VALUES {values} '
                 f'ON CONFLICT DO NOTHING;'
             ).replace('[', '').replace(']', '')  # Notice the replace x2 to get rid of the list brackets

            print(query)
            cursor.execute(query)
    finally:
        cursor.close()
        self.connectionPool.putconn(conn)

...

self.insert('types', 'name, created_at', rows)

After the double replace, printing query returns something like this and the rows are ingested:

INSERT INTO types (name, created_at) VALUES ('TIMER', '2022-04-09 03:19:49'), ('Sequence1', '2022-04-09 03:19:49') ON CONFLICT DO NOTHING;

Is my approach secure? Is there a more pythonic implementation using execute?


Solution

  • No, this isn’t secure or even reliable – Python repr isn’t compatible with PostgreSQL string syntax (try some strings with single quotes, newlines, or backslashes).

    • Consider passing array parameters instead and using UNNEST:

      cursor.execute(
          "INSERT INTO types (name, created_at)"
          " SELECT name, created_at FROM UNNEST (%(names)s, %(created_ats)s) AS t",
          {
              'names': ['TIMER', 'Sequence1', ...],
              'created_ats': ['2022-04-09 03:19:49', ...],
          })
      

      This is the best solution, as the query doesn’t depend on the parameters (can be prepared and cached, statistics can be easily grouped, makes the absence of SQL injection vulnerability obvious, can easily log queries without data).

    • Failing that, build a query that’s only dynamic in the number of parameters, like VALUES ((%s, %s, ...), (%s, %s, ...), ...). Note that PostgreSQL has a parameter limit, so you might need to produce these in batches.

    • Failing that, use psycopg2.sql.Literal.