Search code examples
pythonpostgresqlpsycopg2psql

Unable to set psycopg2 autocommit after shp2pgsql import


I am loading a shapefile into a postGIS database using shp2pgsql, piped via psql, wrapped in a python subprocess like this:

command = "shp2pgsql -s 4269 -a -D -W LATIN1 file.shp table | psql -h host -d db -U user"
p=subprocess.Popen(command, shell=True)
p.communicate()

This works perfectly, with the following output:

Loading objects...
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
SET
SET
BEGIN
COMMIT

There is no END statement, but to the best of my knowledge END and COMMIT are equivalent.

I then want to set con.autocommit = True for a psycopg2 connection to the same database. I get the following error:

psycopg2.ProgrammingError: autocommit cannot be used inside a transaction

Why does psycopg2 report that a transaction is still in progress? Is there a different way I should be closing the psql transaction?

If I don't run the shp2pgsql subprocess command, con.autocommit executes successfully. Does shp2pgsql by default leave a transaction open somewhere? (http://www.bostongis.com/pgsql2shp_shp2pgsql_quickguide.bqg doesn't suggest this)

No relevant entries exist in pg_locks to suggest a stalled/idle transaction. I do not use the psycopg2 connection object in the shp2pgsql function. And, if I recreate a new connection object

con = psycopg2.connect(host=db_host, user=db_user, password=db_pass, database=db_name)

after the shp2pgsql function, con.autocommit=True works fine.

Edit: I of course can simply create the psycopg2 connection object after all shp2pgsql imports have finished, but this is not ideal in my code, and I'd rather understand what's happening.

Edit2: setting con.autocommit=True immediately after opening the psycopg2 connection, as opposed to later, bypasses this error.

Edit3: adding MWE

import psycopg2
import os
import subprocess
from glob import glob

def vacuum(con, table=""):
    autocommit_orig = con.autocommit
    con.autocommit = True
    with con.cursor() as cur:
        cur.execute("VACUUM ANALYZE {};".format(table))
    con.autocommit = autocommit_orig

def read_shapefile(path, tablename, srid="4269"):
    command = "shp2pgsql -s {} -a -D -W LATIN1 {} {} | psql -h {} -d {} -U {}".format(srid, path, tablename, host, dbname, user)
    p=subprocess.Popen(command, shell=True)
    p.communicate()

def load_data(con, datapath):
    dir = os.path.join(datapath,dataname)
    shapefiles = glob(os.path.join(dir,"*.shp"))

    for shapefile in shapefiles:
        read_shapefile(shapefile, tablename)

if __name__ == "__main__":
    con = psycopg2.connect(host=db_host, user=db_user, password=db_pass, database=db_name)
    load_data(con, datapath)
    vacuum(con, tablename)

Solution

  • I can't see where this is happening, but according to this, this, and this, a transaction is begun the first time a command is sent to the database.

    Transactions are per-connection, so psql shouldn't be tripping you up.

    Following this advice, my suggestion is that you stick a con.rollback() just before con.autocommit=True in your code. This will end the implicit transaction that's somehow got started. If you still have all the data you expect, than something's issued a SELECT command or similar read-only directive.

    If you move the con.rollback() backwards from con.autocommit=True it will allow you to isolate where the transaction has begun without restructuring your code.

    It is a guess, but perhaps when psql changes the database state psycopg2 begins a transaction at that time? I haven't found docs to support this hypothesis.