I know this has been asked in many forms already, but I can't seem to find my answer and hope to receive some help here. I try to download files that are stored behind a list of urls..
I've found following function that should do what I want:
import os.path
import urllib.request
import requests
for link in links:
link = link.strip()
name = link.rsplit('/', 1)[-1]
filename = os.path.join('downloads', name)
if not os.path.isfile(filename):
print('Downloading: ' + filename)
try:
urllib.request.urlretrieve(link, filename)
except Exception as inst:
print(inst)
print(' Encountered unknown error. Continuing.')
I always receive: HTTP Error 400: Bad Request.
I tried to set user-agents to fake a browser visit (I use Google Chrome), but it did not help at all. The links work if copied in the browser, hence I wonder how to solve this.
I found the answer to my own question.
The problem was that the urls contained white spaces, which apparently can not be read in properly by urllib.request
. The solution is to first parse the urls into quotes and then call the quoted url.
This is the working code for all that run into the same problem:
import os.path
import urllib.request
import requests
import urllib.parse
for link in urls:
link = link.strip()
name = link.rsplit('/', 1)[-1]
filename = os.path.join(name)
quoted_url = urllib.parse.quote(link, safe=":/")
if not os.path.isfile(filename):
print('Downloading: ' + filename)
try:
urllib.request.urlretrieve(quoted_url, filename)
except Exception as inst:
print(inst)
print(' Encountered unknown error. Continuing.')