My code is below:
import urllib.request
import urllib.parse
from lxml import etree
HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
url = url + urllib.parse.quote(word)
print('search: ', word)
req = urllib.request.Request(url=url, headers=HEADERS, method='GET')
response = urllib.request.urlopen(req)
text = response.read().decode('utf-8')
This works fine, but after about 400 requests, I got this error:
urllib.error.URLError: <urlopen error [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1076)>
What might be the cause of this?
The connection is being rejected during the TLS handshake for some reason. It could just be something transient (and a retry would work) or it may be due to a mismatch due to TLS protocol versions or ciphers. Or it could also be for other reasons that aren't immediately obvious, like a block list, or a broken server etc.
Adding some detection to your code that retries a few times then skips is probably a good general approach. Something like:
for i in range(0,3):
try:
# CONNECT
except:
continue
break
If you want to understand exactly why this particular URL is failing, the easiest solution is to download a copy of wireshark and see what is happening on the wire. There will likely be a TLS error, and possibly an alert code and message that gives more information.