Search code examples
pythondownloadpython-requestszip

Using requests module to download zip file from aspx site


I'm trying to download a zip file using the requests module. If I run this code, it creates a zip file on my machine, but it is just HTML of an error page. If I enter the URL into a browser, it correctly downloads the zipped file.

import requests
zipurl = "https://www.dallascad.org/ViewPDFs.aspx?type=3&id=\\DCAD.ORG\WEB\WEBDATA\WEBFORMS\data%20products\DCAD2021_CURRENT.zip"
zname =  "DCAD2021_CURRENT.zip"
resp = requests.get(zipurl)
zfile = open(zname, 'wb')
zfile.write(resp.content)
zfile.close()  

Solution

  • TLDR: The zipurl you provided which works in the browser works because the browser is encoding and escaping some characters. The correct url is instead as follows:

    import requests
    
    params = {
        'type': '3',
        'id': '//DCAD.ORG/WEB/WEBDATA/WEBFORMS/data products/DCAD2021_CURRENT.zip',
    }
    
    response = requests.get('https://www.dallascad.org/ViewPDFs.aspx', params=params) 
    

    Determined this to be the case by:

    Navigating to the zipurl in browser with the inspect network tab open, I copied the request as a curl. Then I copied this curl into https://curl.trillworks.com/, and saw if the python request would work. It did. Then I removed the headers and verified it still worked. So then I compared the two different url's and saw some differences in encoding/slashing.

    requests.utils.unquote(response.url)
    'https://www.dallascad.org/ViewPDFs.aspx?type=3&id=//DCAD.ORG/WEB/WEBDATA/WEBFORMS/data+products/DCAD2021_CURRENT.zip'
    

    vs.

    requests.utils.unquote(zipurl)
    'https://www.dallascad.org/ViewPDFs.aspx?type=3&id=\\DCAD.ORG\\WEB\\WEBDATA\\WEBFORMS\\data+products\\DCAD2021_CURRENT.zip'