I am trying to download some images from NASS Case Viewer. An example of a case is
The link to the image viewer for this case is
which may not be viewable, I assume because of the https. However, this is simply the Front second image.
The actual link to the image is (or should be?)
This will simply download aspx binaries.
My problem is that I do not know how to store these binaries to proper jpg files.
Example of code I've tried is
import requests
test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1"
pull_image = requests.get(test_image)
with open("test_image.jpg", "wb+") as myfile:
myfile.write(str.encode(pull_image.text))
But this does not result in a proper jpg file. I've also inspected pull_image.raw.read()
and saw that it's empty.
What could be the issue here? Are my URL's improper? I've used Beautifulsoup to put these URLs together and reviewed them by inspecting the HTML code from a few pages.
Am I saving the binaries incorrectly?
.text
decodes the response content to string, so your imge file will be corrupted.
Instead you should use .content
which holds the binary response content.
import requests
test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1"
pull_image = requests.get(test_image)
with open("test_image.jpg", "wb+") as myfile:
myfile.write(pull_image.content)
.raw.read()
also returns bytes, but in order to use it you must set the stream
parameter to True
.
pull_image = requests.get(test_image, stream=True)
with open("test_image.jpg", "wb+") as myfile:
myfile.write(pull_image.raw.read())