Search code examples
python-3.xxmlseleniumurllib

Downloading image file in python 3 from web using selenium/urllib


Trying to download a captcha image.

Getting following error: urllib.error.HTTPError: HTTP Error 500: Internal Server Error

See code below:

from selenium import webdriver
import urllib.request

driver = webdriver.Chrome()

driver.get('https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/index')

img = driver.find_element_by_xpath('//*[@id="type_recherche"]/div[5]/div/img')
src = img.get_attribute('src')

urllib.request.urlretrieve(src, "captcha.png")

When I print src I get the following:

DevTools listening on ws://127.0.0.1:65317/devtools/browser/36eb75bc-f03c-41ee-96cc-138df591c665
https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/createimage.png?timestamp=1583199024767

Solution

  • Here is the sample script that you can use to save the captcha.jpg.

    import requests
    import shutil
    url = "https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/createimage.png?timestamp=1583203496087"
    # we are able to use the same cookie even after refreshing (so you should be good to use the same cookie)
    headers = {
      'Cookie': 'JSESSIONID=YB-eSDCWKU-SG_bKEtluH8kzvWMop4B0plLN4NOLXtO09plZSEuS!-209918963'
    }
    
    response = requests.get(url,headers=headers, stream=True)
    with open("captcha.jpg", 'wb') as f:
        response.raw.decode_content = True
        shutil.copyfileobj(response.raw, f)
    
    

    enter image description here

    Below is the complete code.

    from selenium import webdriver
    import requests
    import shutil
    
    driver = webdriver.Chrome()
    
    driver.get('https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/index')
    
    img = driver.find_element_by_xpath('//*[@id="type_recherche"]/div[5]/div/img')
    src = img.get_attribute('src')
    jsession = driver.get_cookie('JSESSIONID')['value']
    headers = {
      'Cookie': 'JSESSIONID='+jsession
    }
    
    response = requests.get(src,headers=headers, stream=True)
    with open("captcha.jpg", 'wb') as f:
        response.raw.decode_content = True
        shutil.copyfileobj(response.raw, f)