Search code examples
pythonsslurllib

URLLib with cert SSLv3 alert handshake failure


I'm using Python 3.7.3 and the requests_pkcs12 library to scrape a website where I must pass a certificate and password, then download and extract zip files from links on the page. I've got the first part working fine. But when I try to read the files using urllib, I get an error.

import urllib.request
from bs4 import BeautifulSoup
import requests
from requests_pkcs12 import get

# get page and setup BeautifulSoup
# r = requests.get(url) # old non-cert method
r = get(url, pkcs12_filename=certpath, pkcs12_password=certpwd)

# find zip files to download
soup = BeautifulSoup(r.content, "html.parser")

# Read files
i = 1
for td in soup.find_all(lambda tag: tag.name=='td' and tag.text.strip().endswith('DAILY.zip')):
    link = td.find_next('a')
    print(td.get_text(strip=True), link['href'] if link else '')  # good
    zipurl = 'https:\\my.downloadsite.com" + link['href'] if link else ''
    print (zipurl)  # good
    # Read zip file from URL    
    url = urllib.request.urlopen(zipurl)  # ERROR on this line SSLv3 alert handshake failure
    zippedData = url.read()

I've seen various older posts with Python 2.x on ways to handle this, but wondering what the best way to do this now, with new libraries in Python 3.7.x.

Below is the stack trace of the error.

Here is the stack of the error:


Solution

  • Answer was to not use urllib and instead use the same requests replacement that allows a pfx and password passed to it.

    Last 2 lines:

    url = urllib.request.urlopen(zipurl)  # ERROR on this line SSLv3 alert handshake failure
    zippedData = url.read()
    

    should be replaced with:

    from requests_pkcs12 import get, post
    certpath = "c:/certs/cert1.pfx"
    certpwd = "mypassword1234"
    :
    :
    url = get(zipurl, pkcs12_filename=certpath, pkcs12_password=certpwd)
    zippedData = url.content