Search code examples
pythonpostgresqlzipopenstreetmapunzip

Postgres or Python script raises error when trying to access database


I've been following this tutorial.

I am working on a VM which, due to company's policy can't be connected to Internet, so I modified step Shapefile download in Loading data section. I downloaded required .zip archives (specified in external-data.yml) to different machine and copied it via pscp. I also modified the Python script get-external-data.py to just unzip files, not try to download them.

Expected behavior is to load shapefiles into prepared postgres database.

Actual behavior is the following:

ubuadmin@klab-osm:~/src/openstreetmap-carto$ ls
antarctica-icesheet-outlines-3857.zip  Dockerfile.db      ne_110m_admin_0_boundary_lines_land.zip  road-colors.yaml
antarctica-icesheet-polygons-3857.zip  Dockerfile.import  openstreetmap-carto.lua                  scripts
CARTOGRAPHY.md                         DOCKER.md          openstreetmap-carto.style                simplified-water-polygons-split-3857.zip
CHANGELOG.md                           external-data.yml  package-lock.json                        style
CODE_OF_CONDUCT.md                     indexes.sql        patterns                                 symbols
CONTRIBUTING.md                        indexes.yml        preview.png                              USECASES.md
data                                   INSTALL.md         project.mml                              water-polygons-split-3857.zip
docker-compose.yml                     LICENSE.txt        README.md
Dockerfile                             mapnik.xml         RELEASES.md

Everything appears to be working - all 4 .zip archives are in place. So I execute next step:

ubuadmin@klab-osm:~/src/openstreetmap-carto$ sudo python3 ./scripts/get-external-data.py
[sudo] password for ubuadmin:
INFO:root:Starting load of external data into database
INFO:root:Checking table simplified_water_polygons
INFO:root:  Decompressing files
INFO:root:  Water Polygons
INFO:root:  Antarctica Icesheet Polygons
INFO:root:  Antarctica Icesheet Polygons
INFO:root:  Admin Boundary Lines
INFO:root:  Decompressing done
INFO:root:  Importing into database
CRITICAL:root:ogr2ogr returned 1 with layer simplified_water_polygons
CRITICAL:root:Command line was ogr2ogr -f PostgreSQL -lco GEOMETRY_NAME=way -lco SPATIAL_INDEX=FALSE -lco EXTRACT_SCHEMA_FROM_LAYER_NAME=YES -nln loading.simplified_water_polygons "PG:dbname=gis port=5432 user=user host=localhost password=password" data/simplified_water_polygons/simplified-water-polygons-split-3857/simplified_water_polygons.shp
CRITICAL:root:Output was

Traceback (most recent call last):
  File "./scripts/get-external-data.py", line 290, in main
    subprocess.check_output(
  File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ogr2ogr', '-f', 'PostgreSQL', '-lco', 'GEOMETRY_NAME=way', '-lco', 'SPATIAL_INDEX=FALSE', '-lco', 'EXTRACT_SCHEMA_FROM_LAYER_NAME=YES', '-nln', 'loading.simplified_water_polygons', 'PG:dbname=gis port=5432 user=ubuadmin host=localhost password=ubuadmin', 'data/simplified_water_polygons/simplified-water-polygons-split-3857/simplified_water_polygons.shp']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./scripts/get-external-data.py", line 313, in <module>
    main()
  File "./scripts/get-external-data.py", line 299, in main
    raise RuntimeError(
RuntimeError: ogr2ogr error when loading table simplified_water_polygons

I am using:

  • Ubuntu 20.04
  • postgres 12
  • postgis 3.1
  • python 3.8.10

Modified part of get-external-data.py script looks like this:

if "archive" in source and source["archive"]["format"] == "zip":
    logging.info("  Decompressing files")

# strange archive name - was manually added, because of no Internet connection
logging.info("  Water Polygons")
with zipfile.ZipFile("water-polygons-split-3857.zip", 'r') as zip_ref:
    zip_ref.extractall(workingdir)
logging.info("  Antarctica Icesheet Polygons")
with zipfile.ZipFile("antarctica-icesheet-polygons-3857.zip", 'r') as zip_ref:
    zip_ref.extractall(workingdir)
logging.info("  Antarctica Icesheet Polygons")
with zipfile.ZipFile("antarctica-icesheet-outlines-3857.zip", 'r') as zip_ref:
    zip_ref.extractall(workingdir)
logging.info("  Admin Boundary Lines")
with zipfile.ZipFile("ne_110m_admin_0_boundary_lines_land.zip", 'r') as zip_ref:
    zip_ref.extractall(workingdir)

logging.info("  Decompressing done")

ogrpg = "PG:dbname={}".format(database)

if port is not None:
    ogrpg = ogrpg + " port={}".format(port)
if user is not None:
    ogrpg = ogrpg + " user={}".format(user)
if host is not None:
    ogrpg = ogrpg + " host={}".format(host)
if password is not None:
    ogrpg = ogrpg + " password={}".format(password)

ogrcommand = ["ogr2ogr",
              '-f', 'PostgreSQL',
              '-lco', 'GEOMETRY_NAME=way',
              '-lco', 'SPATIAL_INDEX=FALSE',
              '-lco', 'EXTRACT_SCHEMA_FROM_LAYER_NAME=YES',
              '-nln', "{}.{}".format(config["settings"]["temp_schema"], name)]

if "ogropts" in source:
    ogrcommand += source["ogropts"]

if DEBUG:
    logging.info(f"Config: \n{config}")
    logging.info(f"Source: \n{source}")

ogrcommand += [ogrpg,
               os.path.join(workingdir, source["file"])]

logging.info("  Importing into database")
logging.debug("running {}".format(
    subprocess.list2cmdline(ogrcommand)))

# ogr2ogr can raise errors here, so they need to be caught
try:
    subprocess.check_output(
        ogrcommand, stderr=subprocess.PIPE, universal_newlines=True)
except subprocess.CalledProcessError as e:
    # Add more detail on stdout for the logs
    logging.critical(
        "ogr2ogr returned {} with layer {}".format(e.returncode, name))
    logging.critical("Command line was {}".format(
        subprocess.list2cmdline(e.cmd)))
    logging.critical("Output was\n{}".format(e.output))
    raise RuntimeError(
        "ogr2ogr error when loading table {}".format(name))

logging.info("  Import complete")

It seems unziping files works just fine, but postgres db somehow blocks accessing specified tables.


Solution

  • Thanks @Tomerikoo

    I found an answer, here it is. Apparently asking question helps find solution, even if you don't get answered. Mistake was in Python script - one of .zip archives didn't get unziped. Fixed part of script is the following:

    if "archive" in source and source["archive"]["format"] == "zip":
        logging.info("  Decompressing files")
        # strange archive name - was manually added, because of no Internet connection
        logging.info("  Simplified Water Polygons")
        with zipfile.ZipFile("simplified-water-polygons-split-3857.zip", 'r') as zip_ref:
            zip_ref.extractall(workingdir)
        logging.info("  Water Polygons")
        with zipfile.ZipFile("water-polygons-split-3857.zip", 'r') as zip_ref:
            zip_ref.extractall(workingdir)
        logging.info("  Antarctica Icesheet Polygons")
        with zipfile.ZipFile("antarctica-icesheet-polygons-3857.zip", 'r') as zip_ref:
            zip_ref.extractall(workingdir)
        logging.info("  Antarctica Icesheet Polygons")
        with zipfile.ZipFile("antarctica-icesheet-outlines-3857.zip", 'r') as zip_ref:
            zip_ref.extractall(workingdir)
        logging.info("  Admin Boundary Lines")
        with zipfile.ZipFile("ne_110m_admin_0_boundary_lines_land.zip", 'r') as zip_ref:
            zip_ref.extractall(workingdir)