Search code examples
pythonweb-scrapingdownloadpython-requestszip

Download a zip file from a URL using requests module in python


when I access this website, my browser opens a box to download a zip file.

I am trying to download the zip file through a python script (I am a begginer in coding). I would like to automate the process of downloading a batch of similar links in the future, but I am testing with only one link for now. Here is my code:

import requests

url = 'https://sigef.incra.gov.br/geo/exportar/vertice/shp/454698fd-6dfa-49a1-8096-bd9bb57b62ca'
r = requests.get(url, verify=False, allow_redirects=True)

open('verticeshp454698fd-6dfa-49a1-8096-bd9bb57b62ca.zip', 'wb').write(r.content)

As an output I get a broken zip file, not the one i wanted. I also get the following message in the command prompt:

C:\Users\joaop\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py:979: InsecureRequestWarning: Unverified HTTPS request is being made to host 'sigef.incra.gov.br'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  warnings.warn(

What steps am I missing here? Thanks in advance for your help.


Solution

  • I got it working by adding / at the end of the url

    import requests
    
    # the `/` at the end is important
    url = 'https://sigef.incra.gov.br/geo/exportar/vertice/shp/454698fd-6dfa-49a1-8096-bd9bb57b62ca/'
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2866.71 Safari/537.36", 
      }
    
    r = requests.get(url, headers=headers, verify=False, allow_redirects=True)
    
    # get the filename from the headers `454698fd-6dfa-49a1-8096-bd9bb57b62ca_vertice.zip`
    filename = r.headers['Content-Disposition'].split("filename=")[-1]
    
    with open(filename, 'wb') as f:
      f.write(r.content)
    

    See it in action here.