I'm trying to save Geopandas data frame into a shapefile that is written to a zipped folder directly.
As any shapefile user knows, a shapefile is not a single file but rather a collection of files that are meant to be read together. So calling myGDF.to_file(filename='myshapefile.shp', driver='ESRI Shapefile')
creates not only myshapefile.shp
but also myshapefile.prj
, myshapefile.dbf
, myshapefile.shx
and myshapefile.cpg
. This is probably why I am struggling to get the syntax right here.
Consider for instance a dummy Geopandas Dataframe like:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
data = pd.DataFrame({'name': ['a', 'b', 'c'],
'property': ['foo', 'bar', 'foo'],
'x': [173994.1578792833, 173974.1578792833, 173910.1578792833],
'y': [444135.6032947102, 444186.6032947102, 444111.6032947102]})
geometry = [Point(xy) for xy in zip(data['x'], data['y'])]
myGDF = gpd.GeoDataFrame(data, geometry=geometry)
I saw people using gzip
, so I tried:
import geopandas as gpd
myGDF.to_file(filename='myshapefile.shp.gz', driver='ESRI Shapefile',compression='gzip')
But it did not work.
Then I tried the following (in a Google Colab environment):
import zipfile
pathname = '/content/'
filename = 'myshapefile.shp'
zip_file = 'myshapefile.zip'
with zipfile.ZipFile(zip_file, 'w') as zipf:
zipf.write(myGDF.to_file(filename = '/content/myshapefile.shp', driver='ESRI Shapefile'))
But it only saves the .shp
file in a zip folder, while the rest is written next to the zip folder.
How can I write a Geopandas DataFrame as a zipped shapefile directly?
Simply use zip
as a file extension, keeping the name of the driver:
myGDF.to_file(filename='myshapefile.shp.zip', driver='ESRI Shapefile')
This should work with GDAL 3.1 or newer.