Search code examples
pythonpython-2.7urllib

UlrLib Downloading image Unsupported Format


wanted to make a tool in order to save images from a specific link, but ecountered a problem.

My code is the following:

import urllib

urllib.urlretrieve(url, "img.jpg")

The thing is that if I use any link from google it works flawlessly.

For example:

link
(source: asha.org)

  • works

But if I want to get this specific image:

link
(source: keepeek-cache.com)

It saves the file as .jpg, but when I want to open it I get unsupported file format. Any ideas on how to fix it or what is the reason behind?


Solution

  • The problem is that the website is blocking downloads based on the browser signature. Rename your img.jpg file to page.html and open in a browser, then you will see something like this:

    Error 1010 Ray ID: xxxxxxxxx • 2018-06-08 10:39:01 UTC

    Access denied

    What happened?

    The owner of this website (asset.keepeek-cache.com) has banned your access based on your browser's signature (xxxxxxxxxx).

    Cloudflare Ray ID: xxxxxxxxxx • Your IP: xx.xx.xx.xx • Performance & security by Cloudflare

    Once you have considered if you want to perhaps contravene the web site owner's wishes, you can change your user agent by doing (for instance)

    import urllib
    
    # Change user agent to look like Firefox
    urllib.URLopener.version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
    # Download file with new user agent
    urllib.urlretrieve(url, "img.jpg")
    

    which fixed the problem for me.