I'm working on Java code that generates checksum for a given file. I am using Gogole's Guava library for hashing. Here is the code -
import com.google.common.hash.HashCode;
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hashing;
private HashCode doHash(File file) throws IOException {
HashFunction hc = Hashing.murmur3_128();
HashCode hsCode = hc.newHasher().putBytes(com.google.common.io.Files.asByteSource(file).read()).hash();
return hsCode;
}
I ran this code for a file that was 2.8GB in size. It threw the following error -
Exception in thread "main" java.lang.OutOfMemoryError: 2945332859 bytes is too large to fit in a byte array
at com.google.common.io.ByteStreams.toByteArray(ByteStreams.java:232)
at com.google.common.io.Files$FileByteSource.read(Files.java:154)
...
Is there another data structure that I can use here? Or should I look for another strategy to feed the file to the hash function?
Guava's HashFunctions don't know how to deal with ByteSources. But ByteSources know how to deal with HashFunctions. Just do it that way.
HashCode hsCode = Files.asByteSource(file).hash(hc);