Search code examples
imageimage-processingjpegcorruption

How to find images with a variable sized grey rectangle (JPEG corruption) in them?


I had to recover a hard drive and a lot of photos in it came out corrupted. I'm talking about 200.000 photos. I already wrote a script that finds corrupted JPEGs. But some of these images are not corrupted on a file format level. Yet they appear as the example I am showing. The grey part i suspect is data missing from the file. The grey part size is variable and sometimes it has an incomplete line in it.

So I'm thinking I could write or find a script that finds grey rectangles in these images.

How do I do this? Something that opens the image data and looks for this giant grey rectangle? I have no idea where to start. I can code in a bunch of languages.

Any help/examples, is much appreciated.

example of corrupted image


Solution

  • I was thinking that the grey rectangle is always the same colour, so I created a function to see if that grey is one of the top 10 most frequent colours. If the colour had changed, then I would have adjusted the code accordingly to check if the top colour is at least 10x more frequent than the second most frequent colour.

    Didn't have to learn feature detection this time. Shame. :(

        from collections import Counter
        from PIL import Image
    
        # Open the image file
        image = Image.open(file)
    
        # Convert the image to RGB format (if it's not already)
        image = image.convert('RGB')
    
        # Get a list of all the pixels in the image
        pixels = list(image.getdata())
    
        # Count the number of pixels with each RGB value
        counts = Counter(pixels)
    
        most_common_colors =  counts.most_common(10)
        return (128,128,128) in [t[0] for t in most_common_colors]