Search code examples
hashlarge-data

How to Calculate a Hash of a file that is 1 Terabyte and over?


So, I have a couple of system backup image files that are around 1 terabyte, and i want to calculate fast the hash of each one of them (preferably SHA-1).

At first i tried to calculate the md5 hash, 2 hours had passed and the hash hadn't been calculated yet (something that's obvious for large files up to 1TB).

So is there any program/implementation out there that can hash a 1TB file quickly?

I have heard of Tree-Hashing that hashes parts of file simultaneously, but I haven't found any implementation so far.


Solution

  • If you have a 1 million MB file, and your system can read this file at 100MB/s, then

    • 1TB * 1000(TB/GB) = 1000 GB
    • 1000GB * 1000(MB/GB) = 1 million MB
    • 1 million MB/100(MB/s) = 10 thousand seconds
    • 10000s/3600(s/hr) = 2.77... hr
    • Therefore, a 100MB/s system has a hard floor of 2.77... hrs to even read the file in the first place, even before whatever additional total time may be required to compute a hash.

    Your expectations are probably unrealistic - don't try to calculate a faster hash until you can perform a faster file read.