Search code examples
pythongisgeopandasshapefile

How do I resolve CPLE_OpenFailedError when reading zipped shapefile directly from a URL into a GeoDataframe?


I am trying to read a zipped shapefile into a GeoDataframe from a URL without downloading it first. I am working in a Jupyter Notebook using the latest gds_py environment:

import geopandas as gp
url = r"https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip"
gdf = gp.read_file(filename=url, enabled_drivers="ESRI Shapefile")

However, I am getting an error:


---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
File fiona/_shim.pyx:83, in fiona._shim.gdal_open_vector()

File fiona/_err.pyx:291, in fiona._err.exc_wrap_pointer()

CPLE_OpenFailedError: '/vsizip//vsimem/008d1526aae040ec874d51b105a9f3a4.zip' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

DriverError                               Traceback (most recent call last)
Input In [2], in <cell line: 2>()
      1 url = r"https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip"
----> 2 gdf = gp.read_file(filename=url, driver="ESRI Shapefile")

File /opt/anaconda3/envs/gds/lib/python3.9/site-packages/geopandas/io/file.py:253, in _read_file(filename, bbox, mask, rows, engine, **kwargs)
    250     path_or_bytes = filename
    252 if engine == "fiona":
--> 253     return _read_file_fiona(
    254         path_or_bytes, from_bytes, bbox=bbox, mask=mask, rows=rows, **kwargs
    255     )
    256 elif engine == "pyogrio":
    257     return _read_file_pyogrio(
    258         path_or_bytes, bbox=bbox, mask=mask, rows=rows, **kwargs
    259     )

File /opt/anaconda3/envs/gds/lib/python3.9/site-packages/geopandas/io/file.py:294, in _read_file_fiona(path_or_bytes, from_bytes, bbox, mask, rows, **kwargs)
    291     reader = fiona.open
    293 with fiona_env():
--> 294     with reader(path_or_bytes, **kwargs) as features:
    295 
    296         # In a future Fiona release the crs attribute of features will
    297         # no longer be a dict, but will behave like a dict. So this should
    298         # be forwards compatible
    299         crs = (
    300             features.crs["init"]
    301             if features.crs and "init" in features.crs
    302             else features.crs_wkt
    303         )
    305         # handle loading the bounding box

File /opt/anaconda3/envs/gds/lib/python3.9/site-packages/fiona/collection.py:555, in BytesCollection.__init__(self, bytesbuf, **kwds)
    552 self.virtual_file = buffer_to_virtual_file(self.bytesbuf, ext=ext)
    554 # Instantiate the parent class.
--> 555 super(BytesCollection, self).__init__(self.virtual_file, vsi=filetype, **kwds)

File /opt/anaconda3/envs/gds/lib/python3.9/site-packages/fiona/collection.py:162, in Collection.__init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)
    160 if self.mode == 'r':
    161     self.session = Session()
--> 162     self.session.start(self, **kwargs)
    163 elif self.mode in ('a', 'w'):
    164     self.session = WritingSession()

File fiona/ogrext.pyx:540, in fiona.ogrext.Session.start()

File fiona/_shim.pyx:90, in fiona._shim.gdal_open_vector()

DriverError: '/vsizip//vsimem/008d1526aae040ec874d51b105a9f3a4.zip' not recognized as a supported file format.

When I download the zipfile from the URL and open the shapefile in QGIS (3.22.7) it behaves as expected, and as far as I can tell all the required files are present. I've tried resolving the error in a few different ways (below) based on similar questions here and the geopandas documentation, but nothing has worked so far - I keep getting the same error. I've tried:

  • Using a url variable value that reflects the structure within the zipfile.
url = r"https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip!NextGen_NHTS_Shapefile/NextGen_Zone_0825.shp"

  • Setting the crs and encoding arguments.
gdf = gp.read_file(
    filename=url,
    crs="EPSG:4269",
    encoding="utf_8", 
    enabled_drivers="ESRI Shapefile"
)
  • Using more roundabout ways to read the file, like the second strategy in the accepted answer to this question.

Can anyone help me figure out what the problem is?


Solution

  • It depends on where the required shapefile files (e.g., .shp, .shx, .dbf, .prj) are located in the ZIP file. Specifically:

    1. A direct URL to the ZIP file would work if all required shapefile files are in the root folder of the ZIP file. If that's the case, then the below script should work:
      import geopandas as gpd
      url = r"https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip"
      gdf = gpd.read_file(filename=url, enabled_drivers=["ESRI Shapefile"])
      
    2. If the shapefile files are in a subdirectory of the ZIP file, you need to add zip+ to the front of the URL and use ! to point to the subdirectory of the ZIP file. For example:
      import geopandas as gpd
      url = r"zip+https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip!NextGen_NHTS_Shapefile"
      gdf = gpd.read_file(filename=url, enabled_drivers=["ESRI Shapefile"])
      
    3. Note that enabled_drivers needs to be a list, so the correct way to read your shapefile would be gpd.read_file(filename=url, enabled_drivers = ["ESRI Shapefile"]). In fact, in this case, gpd.read_file(url) would work just fine.

    Hope this helps!