Search code examples
pythonpandaszip

Using Pandas, how to read a csv file inside a zip file which you fetch using an url[Python]


This url https://ihmecovid19storage.blob.core.windows.net/latest/ihme-covid19.zip

contains 2 csv files, and 1 pdf which is updated daily, containing Covid-19 Data.

I want to be able to load the Summary_stats_all_locs.csv as a Pandas DataFrame.

Usually if there is a url that points to a csv I can just use df = pd.read_csv(url) but since the csv is inside a zip, I can't do that here.

How would I do this?

Thanks


Solution

  • You will need to first fetch the file, then load it using the ZipFile module. Pandas can read csvs from inside a zip actually, but the problem here is there are multiple, so we need to this and specify the file name.

    import requests
    import pandas as pd
    from zipfile import ZipFile
    from io import BytesIO
    
    r = requests.get("https://ihmecovid19storage.blob.core.windows.net/latest/ihme-covid19.zip")
    files = ZipFile(BytesIO(r.content))
    pd.read_csv(files.open("2020_05_16/Summary_stats_all_locs.csv"))