wanted to make a tool in order to save images from a specific link, but ecountered a problem.
My code is the following:
import urllib
urllib.urlretrieve(url, "img.jpg")
The thing is that if I use any link from google it works flawlessly.
For example:
(source: asha.org)
But if I want to get this specific image:
(source: keepeek-cache.com)
It saves the file as .jpg, but when I want to open it I get unsupported file format. Any ideas on how to fix it or what is the reason behind?
The problem is that the website is blocking downloads based on the browser signature. Rename your img.jpg
file to page.html
and open in a browser, then you will see something like this:
Error 1010 Ray ID: xxxxxxxxx • 2018-06-08 10:39:01 UTC
Access denied
What happened?
The owner of this website (asset.keepeek-cache.com) has banned your access based on your browser's signature (xxxxxxxxxx).
Cloudflare Ray ID: xxxxxxxxxx • Your IP: xx.xx.xx.xx • Performance & security by Cloudflare
Once you have considered if you want to perhaps contravene the web site owner's wishes, you can change your user agent by doing (for instance)
import urllib
# Change user agent to look like Firefox
urllib.URLopener.version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
# Download file with new user agent
urllib.urlretrieve(url, "img.jpg")
which fixed the problem for me.