Search code examples
c#windowslinqcomparediff

Better way to detect file differences between 2 directories?


I made some C# functions to roughly "diff" 2 directories, similar to KDiff3.

First this function compares file names between directories. Any difference in file names implies a file has been added to dir1:

public static List<string> diffFileNamesInDirs(string dir1, string dir2)
{
    List<string> dir1FileNames = Directory
       .EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
       .Select(Path.GetFullPath)
       .Select(entry => entry.Replace(dir1 + "\\", "")
       .ToList();
    List<string> dir2FileNames = Directory
        .EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
        .Select(Path.GetFullPath)
        .Select(entry => entry.Replace(dir2 + "\\", "")
        .ToList();
    List<string> diffs = dir1FileNames.Except(dir2FileNames).Distinct().ToList();

    return diffs;
}

Second this function compares file sizes for file names which exist in both directories. Any difference in file size implies some edit has been made:

public static List<string> diffFileSizesInDirs(string dir1, string dir2)
{
    //Get list of file paths, relative to the base dir1/dir2 directories
    List<string> dir1FileNames = Directory
       .EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
       .Select(Path.GetFullPath)
       .Select(entry => entry.Replace(dir1 + "\\", "")
       .ToList();
    List<string> dir2FileNames = Directory
        .EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
        .Select(Path.GetFullPath)
        .Select(entry => entry.Replace(dir2 + "\\", "")
        .ToList();
    List<string> sharedFileNames = dir1FileNames.Intersect(dir2FileNames).Distinct().ToList();

    //Get list of file sizes corresponding to file paths
    List<long> dir1FileSizes = sharedFileNames
        .Select(s => 
        new FileInfo(dir1 + "\\" + s) //Create the full file path as required for FileInfo objects
        .Length).ToList();
    List<long> dir2FileSizes = sharedFileNames
        .Select(s =>
        new FileInfo(dir2 + "\\" + s) //Create the full file path as required for FileInfo objects
        .Length).ToList();

    List<string> changedFiles = new List<string>();
    for (int i = 0; i < sharedFileNames.Count; i++)
    {
        //If file sizes are different, there must have been a change made to one of the files. 
        if (dir1FileSizes[i] != dir2FileSizes[i])
        {
            changedFiles.Add(sharedFileNames[i]);
        }
    }

    return changedFiles;
}

Lastly combining the results gives a list of all files which have been added/edited between the directories:

List<string> nameDiffs = FileIO.diffFileNamesInDirs(dir1, dir2);
List<string> sizeDiffs = FileIO.diffFileSizesInDirs(dir1, dir2);
List<string> allDiffs = nameDiffs.Concat(sizeDiffs).ToList();

This approach generally works but feels sloppy and also would fail for the "binary equal" case where a file is modified but still has the same size. Any suggestions on a better way?


Solution

  • You could use System.Security.Cryptographie.MD5 to calculate MD5 for each file and compare these.

    E.g. using this Method:

    public static string GetMd5Hash(string path)
    {
        using (var md5 = MD5.Create())
        {
            using (var stream = File.OpenRead(path))
            {
                var hash = md5.ComputeHash(stream);
                return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
            }
        }
    }
    

    Maybe this takes a little bit more time than geting values from FileInfo (depends on the amount of file to compare), but you can be completely sure if files are binary identical.