Search code examples
pythonimageurllib2

Images downloaded with Python are corrupted?


I try to download images, but they become corrupted for some reason? For example: This is an image I want to get.

And the result is this

My test code is:

import urllib2

def download_web_image(url):
    request = urllib2.Request(url)
    img = urllib2.urlopen(request).read()
    with open ('test.jpg', 'w') as f: f.write(img)

download_web_image("http://upload.wikimedia.org/wikipedia/commons/8/8c/JPEG_example_JPG_RIP_025.jpg")

Why is this and how do I fix this?


Solution

  • You are opening 'test.jpg' file in the default (text) mode, which causes Python to use the "correct" newlines on Windows:

    In text mode, the default when reading is to convert platform-specific line endings (\n on Unix, \r\n on Windows) to just \n. When writing in text mode, the default is to convert occurrences of \n back to platform-specific line endings.

    Of course, JPEG files are not text files, and 'fixing' the newlines will only corrupt the image. Instead, open the file in binary mode:

    with open('test.jpg', 'wb') as f:
        f.write(img)
    

    For more details, see the documentation.