Search code examples
javabufferedimage

Repeated loading and saving the same image to file system alters data of image


Repeatedly saving and loading an identical image from the file system leads to changed data and thus to a changed hash sum (which I need).

My program performs the following steps:

1. Create a BufferedImage

BufferedImage bufferedImage = new BufferedImage(400, 400, BufferedImage.TYPE_INT_RGB);
Graphics2D graphics = bufferedImage.createGraphics();
graphics.setColor(Color.RED);
graphics.fillRect(100, 100, 200, 200);
graphics.dispose();

2. Calculate MD5 hash of the created BufferedImage

ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(bufferedImage, "jpg", baos);
byte[] bytesOfImage = baos.toByteArray();
DigestUtils.md5Hex(bytesOfImage); // => bebc7da469524057926f3871bdb07a6a

3. Save BufferedImage to file system

Path tempFile = Files.createTempFile(null, "jpg");
ImageIO.write(bufferedImage, "jpg", tempFile.toFile());

4. Calculating MD5 hash of file

byte[] bytesOfFile = Files.readAllBytes(tempFile);
DigestUtils.md5Hex(bytesOfFile); // => bebc7da469524057926f3871bdb07a6a

5. Load image from file system

BufferedImage bufferedImageFromFilesystem = ImageIO.read(tempFile.toFile());

6. Calculate MD5 hash of image loaded from file system

ByteArrayOutputStream baosFS = new ByteArrayOutputStream();
ImageIO.write(bufferedImageFromFilesystem, "jpg", baosFS);
byte[] bytesOfImageFromFilesystem = baosFS.toByteArray();
DigestUtils.md5Hex(bytesOfImageFromFilesystem); // => 11dc0e49342a1ad15ab1b5a7f8bc271e

(Repeat steps 3 to 6 but re-use image from step 5:)
7. Store BufferedImage to filesystem

Path tempFile2 = Files.createTempFile(null, "jpg");
ImageIO.write(bufferedImageFromFilesystem, "jpg", tempFile2.toFile());

8. Calculate MD5 hash of file

byte[] bytesOfFile2 = Files.readAllBytes(tempFile2);
DigestUtils.md5Hex(bytesOfFile2);// => 11dc0e49342a1ad15ab1b5a7f8bc271e

9. Load image from file system

BufferedImage bufferedImageFromFilesystem2 = ImageIO.read(tempFile2.toFile());

10. Calculate MD5 hash of image loaded from file system

ByteArrayOutputStream baosFS2 = new ByteArrayOutputStream();
ImageIO.write(bufferedImageFromFilesystem2, "jpg", baosFS2);
byte[] bytesOfImageFromFilesystem2 = baosFS2.toByteArray();
DigestUtils.md5Hex(bytesOfImageFromFilesystem2); // => d1102e4b7efef384623cac915a21e1c2

(org.apache.commons.codec.digest.DigestUtils is used for MD5 calculation)

Every time I save the same image on the file system using the code snippet #3. and load the same image using the code snipped #5. from the file system, the image data gets altered. The size of the image shrinks by a few bytes. The image can still be opened by the standard windows image viewer and seems to be still valid.

I already checked whether or not the issue is caused by meta data of the image. Comparing the meta data of the jpg files with a proper program does not show any difference of the meta data.

How can I make sure that loading and saving an identical image does not change the file?


Solution

  • You're saving a jpeg, which is a lossy compressed image format, rather than the raw buffer. Lossy means that the process cannot be reversed because information is lost in the process. Saving it as a jpeg uses heuristics to compresses the byte array so as to reduce its size. So, when you load it back it results in a different byte array to the original, hence changed hash. Then you save it again, which again compresses it, leading again to a different hash when you load it. I suspect that if you did this a million times the image would become a single grey pixel and the hash would cease to change.