Search code examples
pythonpandasbinance

Reading ZIP file from Url generates Bad Zip File error


I am trying to download crypto historical data from www.data.binance.vision using python. I try to read the zip files into pandas using pd.read_csv method. This used to work a few months back but now an error pops up saying zipfile.badzipfile: file is not a zip file. I have manually downloaded the data and checked the files. The file is indeed a zip file and contains a CSV file inside it. The url generated is also correct. Kindly guide me how to proceed in the matter.

import pandas as pd
import json, requests
    
base_url = 'https://data.binance.vision/?prefix=data/spot/monthly/klines/NULSBTC/1w/'
url = f'{base_url}NULSBTC-1w-2023-02.zip'
    
df = pd.read_csv(url)
print(df)
Error:
zipfile.BadZipFile: File is not a zip file

Solution

  • Update 1: add column names

    You are using the base url for browsing not downloading:

    #                     Remove ?prefix= --v
    base_url = 'https://data.binance.vision/data/spot/monthly/klines/NULSBTC/1w/'
    url = f'{base_url}NULSBTC-1w-2023-02.zip'
    
    cols = ['Open time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Close time',
            'Quote asset volume', 'Number of trades', 'Taker buy base asset volume',
            'Taker buy quote asset volume', 'Ignore']
    
    df = pd.read_csv(url, header=None, names=cols)  # No headers
    

    Output:

    >>> df
           Open time      Open      High       Low     Close     Volume     Close time  Quote asset volume  Number of trades  Taker buy base asset volume  Taker buy quote asset volume  Ignore
    0  1675641600000  0.000011  0.000015  0.000011  0.000013  7807851.0  1676246399999          100.579287             26961                    4032754.0                     52.369729       0
    1  1676246400000  0.000013  0.000013  0.000011  0.000012  3974078.0  1676851199999           47.905902             17172                    1972135.0                     23.822658       0
    2  1676851200000  0.000012  0.000016  0.000012  0.000012  6897952.0  1677455999999           91.021976             23500                    3428533.0                     45.894707       0