Search code examples
rubyimagedownloadcorruptioncorrupt-data

Validating downloaded image


This downloads image to a disk:

image = open permalink_url, "rb", &:read
...
File.binwrite "images/#{hash}", image

Sometimes it comes corrupted:

enter image description here

while there was no exception.

  1. How do I check if the image has been downloaded correctly (to otherwise retry the procedure)?
  2. How much is it ok, that there was no exception? How did it happen? Was the network exception silenced on some intermediate server?

UPD: Imagemagick says identify "reports if an image is incomplete or corrupt" but it does not:

$ identify temp.png
temp.png PNG 1080x1080 1080x1080+0+0 8-bit sRGB 2.126MB 0.000u 0:00.049

Here are two corrupted images:

  1. https://drive.google.com/file/d/0B3BLwu7Vb2U-MnNqdHV4MzFSX2s/view?usp=sharing
  2. https://drive.google.com/file/d/0B3BLwu7Vb2U-d3Fab2lmT1hvZlE/view?usp=sharing

UPD: I redownloaded the image and did some analysis -- the bad variation has 300000 extra bytes somewhere in the middle broken in a lot of pieces. Garbage is not just 0x00 but looks random.


Solution

  • Use any of the image handling gems, e.g. chunky_png:

    require 'chunky_png'
    begin
      ChunkyPNG::Datastream.from_file('bad.png')
    rescue ChunkyPNG::CRCMismatch
      puts "png corrupted!"
    end
    

    Edit: Datastream is more efficient than Image in this case.

    Edit 2: If you want to be able to validate any format that ImageMagick can handle and don't mind calling external binaries, this should work:

    unless system('identify', '-verbose', 'bad.jpg', out: IO::NULL, err: IO::NULL)
      puts "the file can't be opened or is corrupted"
    end