Search code examples
pythondatabasepostgresqlpsycopg2executemany

Too many server roundtrips w/ psycopg2


I am making a script, that should create a schema for each customer. I’m fetching all metadata from a database that defines how each customer’s schema should look like, and then create it. Everything is well defined, the types, names of tables, etc. A customer has many tables (fx, address, customers, contact, item, etc), and each table has the same metadata.

My procedure now:

  1. get everything I need from the metadataDatabase.
  2. In a for loop, create a table, and then Alter Table and add each metadata (This is done for each table).

Right now my script runs in about a minute for each customer, which I think is too slow. It has something to do with me having a loop, and in that loop, I’m altering each table. I think that instead of me altering (which might be not so clever approach), I should do something like the following: Note that this is just a stupid but valid example:

for table in tables:
    con.execute("CREATE TABLE IF NOT EXISTS tester.%s (%s, %s);", (table, "last_seen   date", "valid_from    timestamp"))

But it gives me this error (it seems like it reads the table name as a string in a string..):

psycopg2.errors.SyntaxError: syntax error at or near "'billing'"

LINE 1: CREATE TABLE IF NOT EXISTS tester.'billing' ('last_seen da...


Solution

  • Consider creating tables with a serial type (i.e., autonumber) ID field and then use alter table for all other fields by using a combination of sql.Identifier for identifiers (schema names, table names, column names, function names, etc.) and regular format for data types which are not literals in SQL statement.

    from psycopg2 import sql
    
    # CREATE TABLE
    query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (ID serial)"""
    
    cur.execute(sql.SQL(query).format(shm = sql.Identifier("tester"),
                                      tbl = sql.Identifier("table")))
    
    # ALTER TABLE
    items = [("last_seen", "date"), ("valid_from", "timestamp")]    
    query = """ALTER TABLE {shm}.{tbl} ADD COLUMN {col} {typ}"""
    
    for item in items:
        # KEEP IDENTIFIER PLACEHOLDERS
        final_query = query.format(shm="{shm}", tbl="{tbl}", col="{col}", typ=i[1])
        cur.execute(sql.SQL(final_query).format(shm = sql.Identifier("tester"), 
                                                tbl = sql.Identifier("table"),
                                                col = sql.Identifier(item[0]))
    

    Alternatively, use str.join with list comprehension for one CREATE TABLE:

    query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (
           "id" serial,
           {vals}
    )"""
    
    items = [("last_seen", "date"), ("valid_from", "timestamp")]  
    val = ",\n       ".join(["{{}} {typ}".format(typ=i[1]) for i in items])
    
    # KEEP IDENTIFIER PLACEHOLDERS
    pre_query = query.format(shm="{shm}", tbl="{tbl}", vals=val)
    
    final_query = sql.SQL(pre_query).format(*[sql.Identifier(i[0]) for i in items], 
                                            shm = sql.Identifier("tester"), 
                                            tbl = sql.Identifier("table"))    
    cur.execute(final_query)
    

    SQL (sent to database)

    CREATE TABLE IF NOT EXISTS "tester"."table" (
           "id" serial,
           "last_seen" date,
           "valid_from" timestamp
    )