Search code examples
pythonmacossslweb-scrapingsslv3

python: sslv3 alert handshake failure when scraping a site


I am using requests to scrape Project Gutenberg when I do:

import requests
requests.get("https://www.gutenberg.org/wiki/Science_Fiction_(Bookshelf)", verify = True) 

I get the error:

    SSLError                                  Traceback (most recent call last)
<ipython-input-33-15981c36e1d3> in <module>()
----> 1 requests.get("https://www.gutenberg.org/wiki/Science_Fiction_(Bookshelf)", verify=True)

/Library/Python/2.7/site-packages/requests/api.pyc in get(url, params, **kwargs)
     67 
     68     kwargs.setdefault('allow_redirects', True)
---> 69     return request('get', url, params=params, **kwargs)
     70 
     71 

/Library/Python/2.7/site-packages/requests/api.pyc in request(method, url, **kwargs)
     48 
     49     session = sessions.Session()
---> 50     response = session.request(method=method, url=url, **kwargs)
     51     # By explicitly closing the session, we avoid leaving sockets open which
     52     # can trigger a ResourceWarning in some cases, and look like a memory leak

/Library/Python/2.7/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    463         }
    464         send_kwargs.update(settings)
--> 465         resp = self.send(prep, **send_kwargs)
    466 
    467         return resp

/Library/Python/2.7/site-packages/requests/sessions.pyc in send(self, request, **kwargs)
    571 
    572         # Send the request
--> 573         r = adapter.send(request, **kwargs)
    574 
    575         # Total elapsed time of the request (approximately)

/Library/Python/2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
    429         except (_SSLError, _HTTPError) as e:
    430             if isinstance(e, _SSLError):
--> 431                 raise SSLError(e, request=request)
    432             elif isinstance(e, ReadTimeoutError):
    433                 raise ReadTimeout(e, request=request)

SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)

This was working a few days ago and I was able to scrape the page. I didn't change anything in my code. I did install Heroku and Postgres and I don't know if that is causing errors.. I can still do requests for google.com and other pages. I am using Python version 2.7.10 and Mac OSX 10.10.5.

How do I get past this error to scrape the gutenberg page? I don't really understand this error, so any help would be appreciated.


Solution

  • It looks like they installed a new SSL certificate recently (Sept. 21, 2015) and when doing so they must have beefed up their security settings because the site only accepts TLS 1.2 connections (no SSLv3 which your library is trying to use, and also no TLS 1.0 or TLS 1.1).

    See the results of their SSL scan here.

    The reason it stopped working has nothing to do with your code, but the fact that they changed their allowed security protocols and your system (OpenSSL version) doesn't appear to support TLS 1.2.

    Try upgrading the OpenSSL libraries on your computer and then you should be able to connect to the site again (sorry I don't know the specifics of updating OpenSSL libraries on Mac for Python).