Search code examples
javafile-handlingfileinputstream

How to compare two files to see if they are the same?


I used to think that I can use the checksum (MD5 or CRC32) to improve the uploading method. That is, if the files checksum is the same, I think it's the same file. But one day I saw the code in org.apache.commons.io.FileUtils which contains two method contentEquals and contentEqualsIgnoreEOL. There are two way to check the same file.

if (file1.getCanonicalFile().equals(file2.getCanonicalFile())) {  
        // same file  
        return true;  
    } 

and

IOUtils.contentEquals(new FileInputStream(f1), new FileInputStream(f2));

Here are what confused me.

  • I can't get enough information about the canonical. What's this meaning?
  • it use the IO stream to check the file instead of the checksum.

So, in which situation should I use the bytes or checksums to check the same file.


Solution

    1. The first one checks the file path to see if both the file refers to the same file.
    2. The second on check the complete file to see if the content of the file is the same.

    Checksum

    1. If two checksum are different you can say with confidence that the files are different.
    2. If the two checksum are equal you can not say with confidence that the files are same.

    Checksum can be used to do quick check by caching the Checksum of each file upfront.