Search code examples
pythonasp.nethttpspython-requestsbinaryfiles

Python download image from HTTPS aspx


I am trying to download some images from NASS Case Viewer. An example of a case is

The link to the image viewer for this case is

which may not be viewable, I assume because of the https. However, this is simply the Front second image.

The actual link to the image is (or should be?)

This will simply download aspx binaries.

My problem is that I do not know how to store these binaries to proper jpg files.

Example of code I've tried is

import requests 
test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1"
pull_image = requests.get(test_image)

with open("test_image.jpg", "wb+") as myfile:
    myfile.write(str.encode(pull_image.text))

But this does not result in a proper jpg file. I've also inspected pull_image.raw.read() and saw that it's empty.

What could be the issue here? Are my URL's improper? I've used Beautifulsoup to put these URLs together and reviewed them by inspecting the HTML code from a few pages.

Am I saving the binaries incorrectly?


Solution

  • .text decodes the response content to string, so your imge file will be corrupted.
    Instead you should use .content which holds the binary response content.

    import requests 
    
    test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1"
    pull_image = requests.get(test_image)
    
    with open("test_image.jpg", "wb+") as myfile:
        myfile.write(pull_image.content)
    

    .raw.read() also returns bytes, but in order to use it you must set the stream parameter to True.

    pull_image = requests.get(test_image, stream=True)
    with open("test_image.jpg", "wb+") as myfile:
        myfile.write(pull_image.raw.read())