I am trying to read WGIData.csv file in a pandas dataframe. WGIData.csv is present inside a zip file which i am downloading from this url
http://databank.worldbank.org/data/download/WGI_csv.zip
But when i tried to read, it throws error BadZipFile: File is not a zip file
Here is my python code
import pandas as pd
from urllib.request import urlopen
from zipfile import ZipFile
class Get_Data():
def Return_csv_from_zip(self, url):
self.zip = urlopen(url)
self.myzip = ZipFile(self.zip)
self.myzip = self.zip.extractall(self.myzip)
self.file = pd.read_csv(self.myzip)
self.zip.close()
return self.file
url = 'http://databank.worldbank.org/data/download/WGI_csv.zip'
data = Get_Data()
df = data.Return_csv_from_zip(url)
urlopen()
does not return an object (HTTPResponse
) you can send to ZipFile()
. You can read()
the response and use io.BytesIO()
to do what you need:
In []:
from io import BytesIO
z = urlopen('http://databank.worldbank.org/data/download/WGI_csv.zip')
myzip = ZipFile(BytesIO(z.read())).extract('WGIData.csv')
pd.read_csv(myzip)
Out[]:
Country Name Country Code Indicator Name Indicator Code 1996 \
0 Anguilla AIA Control of Corruption: Estimate CC.EST NaN
1 Anguilla AIA Control of Corruption: Number of Sources CC.NO.SRC NaN
2 Anguilla AIA Control of Corruption: Percentile Rank CC.PER.RNK NaN
3 Anguilla AIA Control of Corruption: Percentile Rank, Lower ... CC.PER.RNK.LOWER NaN
4 Anguilla AIA Control of Corruption: Percentile Rank, Upper ... CC.PER.RNK.UPPER NaN
5 Anguilla AIA Control of Corruption: Standard Error CC.STD.ERR NaN
...