Repeatedly saving and loading an identical image from the file system leads to changed data and thus to a changed hash sum (which I need).
My program performs the following steps:
1. Create a BufferedImage
BufferedImage bufferedImage = new BufferedImage(400, 400, BufferedImage.TYPE_INT_RGB);
Graphics2D graphics = bufferedImage.createGraphics();
graphics.setColor(Color.RED);
graphics.fillRect(100, 100, 200, 200);
graphics.dispose();
2. Calculate MD5 hash of the created BufferedImage
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(bufferedImage, "jpg", baos);
byte[] bytesOfImage = baos.toByteArray();
DigestUtils.md5Hex(bytesOfImage); // => bebc7da469524057926f3871bdb07a6a
3. Save BufferedImage to file system
Path tempFile = Files.createTempFile(null, "jpg");
ImageIO.write(bufferedImage, "jpg", tempFile.toFile());
4. Calculating MD5 hash of file
byte[] bytesOfFile = Files.readAllBytes(tempFile);
DigestUtils.md5Hex(bytesOfFile); // => bebc7da469524057926f3871bdb07a6a
5. Load image from file system
BufferedImage bufferedImageFromFilesystem = ImageIO.read(tempFile.toFile());
6. Calculate MD5 hash of image loaded from file system
ByteArrayOutputStream baosFS = new ByteArrayOutputStream();
ImageIO.write(bufferedImageFromFilesystem, "jpg", baosFS);
byte[] bytesOfImageFromFilesystem = baosFS.toByteArray();
DigestUtils.md5Hex(bytesOfImageFromFilesystem); // => 11dc0e49342a1ad15ab1b5a7f8bc271e
(Repeat steps 3 to 6 but re-use image from step 5:)
7. Store BufferedImage to filesystem
Path tempFile2 = Files.createTempFile(null, "jpg");
ImageIO.write(bufferedImageFromFilesystem, "jpg", tempFile2.toFile());
8. Calculate MD5 hash of file
byte[] bytesOfFile2 = Files.readAllBytes(tempFile2);
DigestUtils.md5Hex(bytesOfFile2);// => 11dc0e49342a1ad15ab1b5a7f8bc271e
9. Load image from file system
BufferedImage bufferedImageFromFilesystem2 = ImageIO.read(tempFile2.toFile());
10. Calculate MD5 hash of image loaded from file system
ByteArrayOutputStream baosFS2 = new ByteArrayOutputStream();
ImageIO.write(bufferedImageFromFilesystem2, "jpg", baosFS2);
byte[] bytesOfImageFromFilesystem2 = baosFS2.toByteArray();
DigestUtils.md5Hex(bytesOfImageFromFilesystem2); // => d1102e4b7efef384623cac915a21e1c2
(org.apache.commons.codec.digest.DigestUtils is used for MD5 calculation)
Every time I save the same image on the file system using the code snippet #3. and load the same image using the code snipped #5. from the file system, the image data gets altered. The size of the image shrinks by a few bytes. The image can still be opened by the standard windows image viewer and seems to be still valid.
I already checked whether or not the issue is caused by meta data of the image. Comparing the meta data of the jpg files with a proper program does not show any difference of the meta data.
How can I make sure that loading and saving an identical image does not change the file?
You're saving a jpeg, which is a lossy compressed image format, rather than the raw buffer. Lossy means that the process cannot be reversed because information is lost in the process. Saving it as a jpeg uses heuristics to compresses the byte array so as to reduce its size. So, when you load it back it results in a different byte array to the original, hence changed hash. Then you save it again, which again compresses it, leading again to a different hash when you load it. I suspect that if you did this a million times the image would become a single grey pixel and the hash would cease to change.