I am trying to write a function that takes a url and a path and downloads a file to that path IF it's a text file.
import urllib
import re
import os
mcBethURL = 'https://ia802707.us.archive.org/1/items/macbeth02264gut/0ws3410.txt'
def download_file(url, path, local_filename):
try:
url_type = urllib.request.urlopen(url).info()['content-type']
if bool(re.search('t[e]*xt', url_type)):
local_filename = url.split('/')[-1]
location = os.path.join("/{}/{}".format(path, local_filename))
urllib.request.urlretrieve(url, path, filename=local_filename)
else:
print('No text file found at given URL, download aborted!')
# some more exceptions here yet not relevant
except:
print('invalid url')
download_file(mcBethURL, '/home/wilma/PycharmProjects/Uni', 'mcBeth')
urllib.request.urlretrieve(url, path, filename=local_filename)
doesn't work since it prints invalid url
yet urllib.request.urlretrieve(url, filename=local_filename)
works yet I can not specify a path. I inserted the path parameter looking at How to download to a specific directory?
Do have an idea why I can not urlretrieve specifying a path variable and a name for the file in which the download should be saved in?
So looking at this What command to use instead of urllib.request.urlretrieve? it looks like urllib.request.urlretrieve
is on the outs and you might consider using shutil.copyfileobj
or requests.get
. From looking at the docs. This example seems relevant for the legacy interface you are using.
import urllib.request
local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)
html.close()
In the docs urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)
does not have a second positional argument so it is being ignored in your code.