Search code examples
photoexifmetadata-extractor

How do I obtain a hash of the payload of a digital photo container, ideally in Java?


I have edited EXIF properties on digital pictures, and would like to be able to identify them as identical. I believe this implies extracting the payload stream and computing a hash. What is the best way to do this, ideally in the Java language, most ideally in Java using a native implementation for performance.


Solution

  • JPEG files are a series of 'segments'. Some contain image data, others don't.

    Exif data is stored in the APP1 segment. You could write some code to compare the other segments, and see if they match. A hash seems like a reasonable approach here. For example, you might compare a hash of only the SOI, DQT or DHT segments. You'd need to experiment to see which of these gives the best result.

    Check out the JpegSegmentReader class from my metadata-extractor library.

    With that class you can pull out specific segment(s) from a JPEG for processing.

    Let us know how you get on!