I am loading a shapefile into a postGIS database using shp2pgsql, piped via psql, wrapped in a python subprocess like this:
command = "shp2pgsql -s 4269 -a -D -W LATIN1 file.shp table | psql -h host -d db -U user"
p=subprocess.Popen(command, shell=True)
p.communicate()
This works perfectly, with the following output:
Loading objects...
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
SET
SET
BEGIN
COMMIT
There is no END
statement, but to the best of my knowledge END
and COMMIT
are equivalent.
I then want to set con.autocommit = True
for a psycopg2 connection to the same database. I get the following error:
psycopg2.ProgrammingError: autocommit cannot be used inside a transaction
Why does psycopg2 report that a transaction is still in progress? Is there a different way I should be closing the psql transaction?
If I don't run the shp2pgsql subprocess command, con.autocommit
executes successfully. Does shp2pgsql by default leave a transaction open somewhere? (http://www.bostongis.com/pgsql2shp_shp2pgsql_quickguide.bqg doesn't suggest this)
No relevant entries exist in pg_locks
to suggest a stalled/idle transaction. I do not use the psycopg2 connection object in the shp2pgsql function. And, if I recreate a new connection object
con = psycopg2.connect(host=db_host, user=db_user, password=db_pass, database=db_name)
after the shp2pgsql function, con.autocommit=True
works fine.
Edit: I of course can simply create the psycopg2 connection object after all shp2pgsql imports have finished, but this is not ideal in my code, and I'd rather understand what's happening.
Edit2: setting con.autocommit=True
immediately after opening the psycopg2 connection, as opposed to later, bypasses this error.
Edit3: adding MWE
import psycopg2
import os
import subprocess
from glob import glob
def vacuum(con, table=""):
autocommit_orig = con.autocommit
con.autocommit = True
with con.cursor() as cur:
cur.execute("VACUUM ANALYZE {};".format(table))
con.autocommit = autocommit_orig
def read_shapefile(path, tablename, srid="4269"):
command = "shp2pgsql -s {} -a -D -W LATIN1 {} {} | psql -h {} -d {} -U {}".format(srid, path, tablename, host, dbname, user)
p=subprocess.Popen(command, shell=True)
p.communicate()
def load_data(con, datapath):
dir = os.path.join(datapath,dataname)
shapefiles = glob(os.path.join(dir,"*.shp"))
for shapefile in shapefiles:
read_shapefile(shapefile, tablename)
if __name__ == "__main__":
con = psycopg2.connect(host=db_host, user=db_user, password=db_pass, database=db_name)
load_data(con, datapath)
vacuum(con, tablename)
I can't see where this is happening, but according to this, this, and this, a transaction is begun the first time a command is sent to the database.
Transactions are per-connection, so psql
shouldn't be tripping you up.
Following this advice, my suggestion is that you stick a con.rollback()
just before con.autocommit=True
in your code. This will end the implicit transaction that's somehow got started. If you still have all the data you expect, than something's issued a SELECT
command or similar read-only directive.
If you move the con.rollback()
backwards from con.autocommit=True
it will allow you to isolate where the transaction has begun without restructuring your code.
It is a guess, but perhaps when psql
changes the database state psycopg2 begins a transaction at that time? I haven't found docs to support this hypothesis.