I am trying to read a zipped shapefile into a GeoDataframe from a URL without downloading it first. I am working in a Jupyter Notebook using the latest gds_py environment:
import geopandas as gp
url = r"https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip"
gdf = gp.read_file(filename=url, enabled_drivers="ESRI Shapefile")
However, I am getting an error:
---------------------------------------------------------------------------
CPLE_OpenFailedError Traceback (most recent call last)
File fiona/_shim.pyx:83, in fiona._shim.gdal_open_vector()
File fiona/_err.pyx:291, in fiona._err.exc_wrap_pointer()
CPLE_OpenFailedError: '/vsizip//vsimem/008d1526aae040ec874d51b105a9f3a4.zip' not recognized as a supported file format.
During handling of the above exception, another exception occurred:
DriverError Traceback (most recent call last)
Input In [2], in <cell line: 2>()
1 url = r"https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip"
----> 2 gdf = gp.read_file(filename=url, driver="ESRI Shapefile")
File /opt/anaconda3/envs/gds/lib/python3.9/site-packages/geopandas/io/file.py:253, in _read_file(filename, bbox, mask, rows, engine, **kwargs)
250 path_or_bytes = filename
252 if engine == "fiona":
--> 253 return _read_file_fiona(
254 path_or_bytes, from_bytes, bbox=bbox, mask=mask, rows=rows, **kwargs
255 )
256 elif engine == "pyogrio":
257 return _read_file_pyogrio(
258 path_or_bytes, bbox=bbox, mask=mask, rows=rows, **kwargs
259 )
File /opt/anaconda3/envs/gds/lib/python3.9/site-packages/geopandas/io/file.py:294, in _read_file_fiona(path_or_bytes, from_bytes, bbox, mask, rows, **kwargs)
291 reader = fiona.open
293 with fiona_env():
--> 294 with reader(path_or_bytes, **kwargs) as features:
295
296 # In a future Fiona release the crs attribute of features will
297 # no longer be a dict, but will behave like a dict. So this should
298 # be forwards compatible
299 crs = (
300 features.crs["init"]
301 if features.crs and "init" in features.crs
302 else features.crs_wkt
303 )
305 # handle loading the bounding box
File /opt/anaconda3/envs/gds/lib/python3.9/site-packages/fiona/collection.py:555, in BytesCollection.__init__(self, bytesbuf, **kwds)
552 self.virtual_file = buffer_to_virtual_file(self.bytesbuf, ext=ext)
554 # Instantiate the parent class.
--> 555 super(BytesCollection, self).__init__(self.virtual_file, vsi=filetype, **kwds)
File /opt/anaconda3/envs/gds/lib/python3.9/site-packages/fiona/collection.py:162, in Collection.__init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)
160 if self.mode == 'r':
161 self.session = Session()
--> 162 self.session.start(self, **kwargs)
163 elif self.mode in ('a', 'w'):
164 self.session = WritingSession()
File fiona/ogrext.pyx:540, in fiona.ogrext.Session.start()
File fiona/_shim.pyx:90, in fiona._shim.gdal_open_vector()
DriverError: '/vsizip//vsimem/008d1526aae040ec874d51b105a9f3a4.zip' not recognized as a supported file format.
When I download the zipfile from the URL and open the shapefile in QGIS (3.22.7) it behaves as expected, and as far as I can tell all the required files are present. I've tried resolving the error in a few different ways (below) based on similar questions here and the geopandas documentation, but nothing has worked so far - I keep getting the same error. I've tried:
url = r"https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip!NextGen_NHTS_Shapefile/NextGen_Zone_0825.shp"
gdf = gp.read_file(
filename=url,
crs="EPSG:4269",
encoding="utf_8",
enabled_drivers="ESRI Shapefile"
)
Can anyone help me figure out what the problem is?
It depends on where the required shapefile files (e.g., .shp, .shx, .dbf, .prj) are located in the ZIP file. Specifically:
import geopandas as gpd
url = r"https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip"
gdf = gpd.read_file(filename=url, enabled_drivers=["ESRI Shapefile"])
zip+
to the front of the URL and use !
to point to the subdirectory of the ZIP file. For example:
import geopandas as gpd
url = r"zip+https://nhts.ornl.gov/od/assets/data/NextGen_NHTS_Shapefile_v3.zip!NextGen_NHTS_Shapefile"
gdf = gpd.read_file(filename=url, enabled_drivers=["ESRI Shapefile"])
gpd.read_file(filename=url, enabled_drivers = ["ESRI Shapefile"])
. In fact, in this case, gpd.read_file(url)
would work just fine.Hope this helps!