I want to load a zipped from from Gitlab to my Jupyter Noteboook with following code:
link='https://git. ... master/data.zip'
import urllib.request
urllib.request.urlretrieve(link, "data.zip")
import zipfile
compressed_file = zipfile.ZipFile('data.zip')
csv_file = compressed_file.open('data.csv')
df = pd.read_csv(csv_file)
I could not download it, I need to get the data from the URL!
I get follwing error in Line 4 (----> 4 compressed_file = zipfile.ZipFile('data.zip'))
BadZipFile: File is not a zip file
What is the error in my code?
Your code sample is not re-producable. Code below shows how to download a zip file from a URL and unzip it. It's geojson so json.loads()
is used, but this can be pd.read_csv()
for CSV data.
This is effectively a three step process
requests.get()
and download chunks to local filezfile.infolist()
All standard requests and file handling independent of usage.
import requests
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
import json, io
# source geojson for country boundaries
geosrc = pd.json_normalize(requests.get("https://pkgstore.datahub.io/core/geo-countries/7/datapackage.json").json()["resources"])
fn = Path(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0]).name
if not Path.cwd().joinpath(fn).exists():
r = requests.get(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0],stream=True,)
with open(fn, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(fn)
with zfile.open(zfile.infolist()[0]) as f:
geojson = json.load(f)