Search code examples
c#comparebenchmarkingchecksum

Compare files byte by byte or read all bytes?


I came across this code http://support.microsoft.com/kb/320348 which made me wonder what would be the best way to compare 2 files in order to figure out if they differ.

The main idea is to optimize my program which needs to verify if any file is equal or not to create a list of changed files and/or files to delete / create.

Currently I am comparing the size of the files if they match i will go into a md5 checksum of the 2 files, but after looking at that code linked at the begin of this question it made me wonder if it is really worth to use it over creating a checksum of the 2 files (which is basically after you get all the bytes) ?

Also what other verifications should I make to reduce the work in check each file ?


Solution

  • Read both files into a small buffer (4K or 8K) which is optimised for reading and then compare buffers in memory (byte by byte) which is optimised for comparing.

    This will give you optimum performance for all cases (where difference is at the start, middle or the end).

    Of course first step is to check if file length differs and if that's the case, files are indeed different..