I have a requirement to 'check the integrity' of the content of files. The files will be written to CD/DVD, which might be copied many times. The idea is to identify copies (after they are removed from Nero etc.) which copied correctly.
Am rather new to this, but a quick search suggests that Arrays.hashCode(byte[])
will fit the need. We can include a file on the disk that contains the result of that call for each resource of interest, then compare it to the byte[]
of the File
as read from disk when checked.
Do I understand the method correctly, is this a valid way to go about checking file content?
If not, suggestions as to search keywords or strategies/methods/classes would be appreciated.
Working code based on the answer of Brendan. It takes care of the problem identified by VoidStar (needing to hold the entire byte[]
in memory for getting the hash).
import java.io.File;
import java.io.FileInputStream;
import java.util.zip.CRC32;
class TestHash {
public static void main(String[] args) throws Exception {
File f = new File("TestHash.java");
FileInputStream fis = new FileInputStream(f);
CRC32 crcMaker = new CRC32();
byte[] buffer = new byte[65536];
int bytesRead;
while((bytesRead = fis.read(buffer)) != -1) {
crcMaker.update(buffer, 0, bytesRead);
}
long crc = crcMaker.getValue(); // This is your error checking code
System.out.println("CRC code is " + crc);
}
}
Arrays.hashCode()
is designed to be very fast (used in hash tables). I highly recommend not using it for this purpose.
What you want is some sort of error-checking code like a CRC.
Java happens to have a class for calculating these: CRC32:
InputStream in = ...;
CRC32 crcMaker = new CRC32();
byte[] buffer = new byte[someSize];
int bytesRead;
while((bytesRead = in.read(buffer)) != -1) {
crcMaker.update(buffer, 0, bytesRead);
}
long crc = crcMaker.getValue(); // This is your error checking code