first ever question: I'm getting the following result:
File "D:\Anaconda\Lib\site-packages\requests\api.py", line 70, in get return request('get', url, params=params, **kwargs)
File "D:\Anaconda\Lib\site-packages\requests\api.py", line 56, in request return session.request(method=method, url=url, **kwargs)
File "D:\Anaconda\Lib\site-packages\requests\sessions.py", line 475, in request resp = self.send(prep, **send_kwargs)
File "D:\Anaconda\Lib\site-packages\requests\sessions.py", line 596, in send r = adapter.send(request, **kwargs)
File "D:\Anaconda\Lib\site-packages\requests\adapters.py", line 497, in send raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 2] No such file or directory
This traces back to one line of code here:
import requests, os, bs4, calendar #, sys
import urllib.request
while not year>2016:
print('Downloading page {}...'.format(url))
res = requests.get(loginpageURL, verify='false', auth=('username', 'password')) #this is the line that doesn't work
res = requests.get(url, verify='false') #but I have tried it without that line and this line also doesn't work
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)
print(soup)
I have researched the issue extensively, and come to the conclusion that it is actually an issue with the requests/urllib3 libraries themselves.
At first, I tried the verify='false' fix here. It didn't work. Someone here said to install new openSSL and certifi, they appear to be installed and up to date on my system. Found the bug has a great writeup on here. No solution from what I could see. It has been identified on github as a known issue here.
When, according to this answer, I tried to change verify='false' to verify='cacert.pem' (which I included in the project directory), it threw this error: requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
Now I'm sitting here just wanting to get this one code snippet to run - I'm trying to bulk download a few hundred zip files from a website - in spite of the known issue with the library. I'm relatively new to python, but especially new to web scraping, so this is a steep learning curve for me. Any help would be appreciated. Do I need to go so far as scrapping requests?
Thanks!
res = requests.get(loginpageURL, verify='false', ...
Verify takes either a boolean (i.e. True or False) or a path which is then used as path for the trust store. Your specification 'false'
is a string and not a boolean and it will therefore try to use the file false
as CA store. This file cannot be found and thus results in No such file or directory
.
To fix this you have to use verify=False
, i.e. use the boolean value.
Apart from that disabling the validation is a bad idea and should only be done for testing or when the security offered by TLS is completely irrelevant for the program. For a login page like in your case disabling validation is probably a bad thing because a man in the middle can thus easily sniff the username and password.