Search code examples

Zipped File from URL to Python (Pandas)

I want to load a zipped from from Gitlab to my Jupyter Noteboook with following code:

link='https://git. ... master/'

import urllib.request
urllib.request.urlretrieve(link, "")
import zipfile
compressed_file = zipfile.ZipFile('')
csv_file ='data.csv')
df = pd.read_csv(csv_file)

I could not download it, I need to get the data from the URL!

I get follwing error in Line 4 (----> 4 compressed_file = zipfile.ZipFile(''))

BadZipFile: File is not a zip file

What is the error in my code?


  • Your code sample is not re-producable. Code below shows how to download a zip file from a URL and unzip it. It's geojson so json.loads() is used, but this can be pd.read_csv() for CSV data. This is effectively a three step process

    • pass a URL to requests.get() and download chunks to local file
    • inspect contents of this zip file for file within it you want to use zfile.infolist()
    • open file handle and use it. For your case pd.read_csv()

    All standard requests and file handling independent of usage.

    import requests
    import pandas as pd
    from pathlib import Path
    from zipfile import ZipFile
    import json, io
    # source geojson for country boundaries
    geosrc = pd.json_normalize(requests.get("").json()["resources"])
    fn = Path(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0]).name
    if not Path.cwd().joinpath(fn).exists():
        r = requests.get(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0],stream=True,)
        with open(fn, "wb") as fd:
            for chunk in r.iter_content(chunk_size=128):
    zfile = ZipFile(fn)
    with[0]) as f:
        geojson = json.load(f)