I am trying to download an image from a URL with Python using the requests and shutil libraries. My code is below:
import requests
import shutil
image_url = "https://www.metmuseum.org/-/media/images/visit/met-fifth-avenue/fifthave_teaser.jpg"
with open("image1.jpg", "wb") as file:
response = requests.get(image_url, stream=True)
response.raw.decode_content = True
shutil.copyfileobj(response.raw, file)
file.close()
This code works for most other image urls that I have tried (eg: https://tinyjpg.com/images/social/website.jpg). However, for the image_url in the code, a 1kb file is created with an error that says "It looks like we don't support this file format."
I have also tried:
import urllib
urllib.request.urlretrieve(image_url, "image1.jpg)
It is possible to do this using Seleniumwire - I used driver.requests to get a list of all requests made by the site, and then looped through these requests until I got a request.response.header that included the file type (.jpg). It appears that there are two requests with the same url (the first with content-type 'text/html' and the second with 'image/jpg').
I would like to run this without loading a WebDriver. Is there any way I can download an image like this using the requests function?
If you view the response.text
you'll see that the server doesn't like your request headers and thinks you're a robot:
'<html>\r\n<head>\r\n<META NAME="robots" CONTENT="noindex,nofollow">\r\n<script src="/_Incapsula_Resource?SWJIYLWA=5074a744e2e3d891814e9a2dace20bd4,719d34d31c8e3a6e6fffd425f7e032f3">\r\n</script>\r\n<body>\r\n</body></html>\r\n'
But if you provide a proper User-Agent header its response changes and you can proceed with saving the file:
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36'}
response = requests.get(image_url, stream=True, headers=headers)
with open("image1.jpg", "bw") as file:
file.write(response.content)
So you have to mock a user-agent in the request headers to get this image.
Also, with
is a context manager, it already closes the file for you.