Search code examples
c#linqienumerableexcept

IEnumerable.Except() on System.Linq.Enumerable+WhereSelectArrayIterator vs. List<T>


Maybe I am missing the details here but I would expect that IEnumerable.Except() would work on Enumerables not concretely cast to a collection.

Let me explain with a simple example: I have a list of files on a directory and I want to exclude the files that start with a certain string.

var allfiles = Directory.GetFiles(@"C:\test\").Select(f => new FileInfo(f));

Getting both the files matching and those not matching would be a matter of identifying one of the two collections and then .Except()-ing on the full list, right?

var matching = allfiles.Where(f => f.Name.StartsWith("TOKEN"));

and

var notmatching = allfiles.Except(matching, new FileComparer());

Where FileComparer() is some class that compares the full path of the two files.

Well, unless I cast both of the three collections to a List, the last notmatching variable still gives me the full list of files after I .Except() on the matching collection. To be clear:

var allfiles = Directory.GetFiles(@"C:\test\").Select(f => new FileInfo(f));
var matching = allfiles.Where(f => f.Name.StartsWith("TOKEN"));
var notmatching = allfiles.Except(matching, new FileComparer());

does not exclude, while

var allfiles = Directory.GetFiles(@"C:\test\").Select(f => new FileInfo(f)).ToList();
var matching = allfiles.Where(f => f.Name.StartsWith("TOKEN")).ToList();
var notmatching = allfiles.Except(matching, new FileComparer()).ToList();

actually does what is says on the tin. What am I missing here? I can't understand why LINQ doesn't manipulate the collection not currently cast to a list.

For instance, the FileComparer does not even get called in the first case.

internal class FileComparer : IEqualityComparer<FileInfo>
{
    public bool Equals(FileInfo x, FileInfo y)
    {
        return x == null ? y == null : (x.Name.Equals(y.Name, StringComparison.OrdinalIgnoreCase) && x.Length == y.Length);
    }

    public int GetHashCode(FileInfo obj)
    {
        return obj.GetHashCode();
    }
}

Solution

  • The difference between the two is that without ToList, the deferred allfiles query is executed twice, producing different instances of FileInfo that will not pass reference equality.

    Your FileComparer implements GetHashCode incorrectly, as it simply returns the reference-based hash code of the FileInfo objects (which does not itself override GetHashCode).

    Implementations are required to ensure that if the Equals(T, T) method returns true for two objects x and y, then the value returned by the GetHashCode(T) method for x must equal the value returned for y.

    The solution is to implement GetHashCode based on the same definition of equality as Equals:

    public int GetHashCode(FileInfo obj)
    {
        return StringComparer.OrdinalIgnoreCase.GetHashCode(obj.Name);
    }