Trying to use requests in python to get the filename from content-disposition but the filename is not present and I also tried to generate name from the url itself. But for some urls for eg. https://www.seedr.cc/zip/88714186?st=fa176033e056f391a766486e690bbcf0b2720842c31cac289a91738304636bac&e=1589129102.
I cannot fetch the filename from url and no content-disposition headers are there. But when I use download managers like IDM or even any browser I am able to get the filename without any issue.
for the above like the name generated by IDM is "8. Post Interview.zip" and filename given by my code is "88714186.zip"
My code snippet is :
import os, re
import requests
from urllib.parse import unquote, urlparse
import mimetypes
useragent = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux i686 on x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2820.59 Safari/537.36'}
def fix_fileName(response, fileName):
name, extension = os.path.splitext(fileName)
if not extension:
mime = response.headers['Content-Type']
if mime != 'application/octet-stream':
extension = mimetypes.guess_extension(response.headers['Content-Type'])
return name + extension
else:
return fileName
def downloader(url):
with requests.get(url, stream= True, headers=useragent) as response:
if response.raise_for_status:
print(response.headers)
if 'filename' in response.headers['Content-Disposition']:
fileName = re.findall("filename=(.+)", response.headers["Content-Disposition"])[0].strip('"')
fileName = fix_fileName(response,fileName)
else:
fileName = os.path.basename(urlparse(url).path)
fileName = fix_fileName(response,fileName)
with open(fileName,'wb') as output_file:
output_file.write(response.content)
def main():
url='https://www.seedr.cc/zip/88714707?st=01607f3f1b4adac3f8bf6292fdbac137207de1defb75646daafc9781dda8dc26&e=1589129561'
downloader(url)
if __name__ == "__main__":
main()
How to accomplish this in python? Please help me with a solution.
This URL redirects. See https://redbot.org/?uri=https%3A%2F%2Fwww.seedr.cc%2Fzip%2F88714186%3Fst%3Dfa176033e056f391a766486e690bbcf0b2720842c31cac289a91738304636bac%26e%3D1589129102. Follow the redirect (Location header field).