I'm using the Python requests library to get a PDF file from the web. This works fine, but I now also want the original filename. If I go to a PDF file in Firefox and click download
it already has a filename defined to save the pdf. How do I get this filename?
For example:
import requests
r = requests.get('http://www.researchgate.net/profile/M_Gotic/publication/260197848_Mater_Sci_Eng_B47_%281997%29_33/links/0c9605301e48beda0f000000.pdf')
print r.headers['content-type'] # prints 'application/pdf'
I checked the r.headers
for anything interesting, but there's no filename in there. I was actually hoping for something like r.filename
..
Does anybody know how I can get the filename of a downloaded PDF file with the requests library?
It is specified in an http header content-disposition
. So to extract the name you would do:
import re
d = r.headers['content-disposition']
fname = re.findall("filename=(.+)", d)[0]
Name extracted from the string via regular expression (re
module).