Search code examples
pythonxmlwindowsbatch-fileurllib2

Python: downloading xml files in batch returns a damaged zip file


Drawing inspiration from this post, I am trying to download a bunch of xml files in batch from a website:

import urllib2

url='http://ratings.food.gov.uk/open-data/'

f = urllib2.urlopen(url)
data = f.read()
with open("C:\Users\MyName\Desktop\data.zip", "wb") as code:
    code.write(data)

The zip file is created within seconds, but as I attempt to access it, an error window comes up:

Windows cannot open the folder.
The Compressed (zipped) Folder "C:\Users\MyName\Desktop\data.zip" is invalid.

What am I doing wrong here?


Solution

  • you are not opening file handles inside the zip file:

    import urllib2
    from bs4 import BeautifulSoup
    import zipfile
    
    url='http://ratings.food.gov.uk/open-data/'
    
    fileurls = []
    
    f = urllib2.urlopen(url)
    mainpage = f.read()
    
    soup = BeautifulSoup(mainpage, 'html.parser')
    
    tablewrapper = soup.find(id='openDataStatic')
    
    for table in tablewrapper.find_all('table'):
        for link in table.find_all('a'):
            fileurls.append(link['href'])
    
    with zipfile.ZipFile("data.zip", "w") as code:
        for url in fileurls:
            print('Downloading: %s' % url)
            f = urllib2.urlopen(url)
            data = f.read()
            xmlfilename = url.rsplit('/', 1)[-1]
            code.writestr(xmlfilename, data)