Search code examples
c#listgenericsienumerableset-difference

How can I return the set diffrence between two FileInfo lists while ignoring the file extension?


I have two IEnumerable<FileInfo> lists that I would like to compare:

IEnumerable<FileInfo> list1 = dir1.GetFiles("*" + _ext1, SearchOption.AllDirectories);
IEnumerable<FileInfo> list2 = dir2.GetFiles("*" + _ext2, SearchOption.AllDirectories);

Where _ext1 and _ext2 are diffrent file extension types. For example:

string _ext1 = ".jpg";
string _ext2 = ".png";

So list1 will look something like:

file1.jpg
file2.jpg
file3.jpg
file4.jpg
file5.jpg
file6.jpg

and list2 will look like:

file1.png
file2.png
file4.png

I want to find everything in list2 that is not present in list1. I have tried the following:

List<string> list1FileNames = list1.Select(f => Path.GetFileNameWithoutExtension(f.FullName)).ToList();
List<string> list2FileNames = list2.Select(f => Path.GetFileNameWithoutExtension(f.FullName)).ToList();
var setDiff = list1FileNames .Except(list2FileNames );

This is great and works fine and returns (notice no file extension):

file3
file5
file6    

However, what I really want is to get a list of FileInfo's not just the FileName strings. I need this because I need other information like the FullFile path, ext.. so just a list of filename strings will not do the job. How can I go about doing this?


Solution

  • If you're looking for speed, try this:

    private IEnumerable<FileInfo> GetUniqueFilesWithoutExtension(IEnumerable<FileInfo> list1, IEnumerable<FileInfo> list2)
    {
        var d = new HashSet<string>();
        foreach (var fi in list2)
        {
            d.Add(Path.GetFileNameWithoutExtension(fi.FullName));
        }
    
        foreach (var fi in list1)
        {
            if (!d.Contains(Path.GetFileNameWithoutExtension(fi.FullName)))
            {
                yield return fi;
            }
        }
    }
    

    Make a hash set of file names (sans extensions) from list2, then iterate through list1 and only return the items with file names (sans extensions) that don't appear in the hash set from list2. The yield return lets you consume your results as they're discovered in a streaming fashion, instead of having to wait for the whole list to be generated, if that matters to you.