Search code examples
pythonpython-requestspython-3.7content-disposition

How to retrieve filename if it is not present in content-disposition or in the url itself?


Trying to use requests in python to get the filename from content-disposition but the filename is not present and I also tried to generate name from the url itself. But for some urls for eg. https://www.seedr.cc/zip/88714186?st=fa176033e056f391a766486e690bbcf0b2720842c31cac289a91738304636bac&e=1589129102.

I cannot fetch the filename from url and no content-disposition headers are there. But when I use download managers like IDM or even any browser I am able to get the filename without any issue.

for the above like the name generated by IDM is "8. Post Interview.zip" and filename given by my code is "88714186.zip"

My code snippet is :

import os, re
import requests

from urllib.parse import unquote, urlparse
import mimetypes

useragent = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux i686 on x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2820.59 Safari/537.36'}

def fix_fileName(response, fileName):
    name, extension = os.path.splitext(fileName)
    if not extension:
        mime = response.headers['Content-Type']
        if mime != 'application/octet-stream':
            extension = mimetypes.guess_extension(response.headers['Content-Type'])    
        return name + extension
    else:
        return fileName

def downloader(url):
    with requests.get(url, stream= True, headers=useragent) as response:
        if response.raise_for_status:
            print(response.headers)
            if 'filename' in response.headers['Content-Disposition']:
                fileName = re.findall("filename=(.+)", response.headers["Content-Disposition"])[0].strip('"')
                fileName = fix_fileName(response,fileName)
            else:
                fileName = os.path.basename(urlparse(url).path)
                fileName = fix_fileName(response,fileName)

            with open(fileName,'wb') as output_file:
                output_file.write(response.content)

def main():
    url='https://www.seedr.cc/zip/88714707?st=01607f3f1b4adac3f8bf6292fdbac137207de1defb75646daafc9781dda8dc26&e=1589129561'
    downloader(url)

if __name__ == "__main__":
    main()

How to accomplish this in python? Please help me with a solution.


Solution

  • This URL redirects. See https://redbot.org/?uri=https%3A%2F%2Fwww.seedr.cc%2Fzip%2F88714186%3Fst%3Dfa176033e056f391a766486e690bbcf0b2720842c31cac289a91738304636bac%26e%3D1589129102. Follow the redirect (Location header field).