Search code examples
c#.netlinqnhibernatecomposite-key

Linq filtering a list by a composite key (contains and not contains)


I need an understanding of how to filter a list by a composite key with the conditions "contains" and "does not contain".

I have a list

List<Entity>();

where entity is a class

public class Entity
{
    int id;
    string docNum;
    int docVersion;
}

There is a similar second list. I need to get those elements of the list that match the following condition: "Get the elements of the first list, the document numbers of which are the same as the second, and the versions of the documents differ in the same document numbers".

I tried to use Contains() in conjunction with Select() like:

firstList
    .Where(x => secondList.Select(y => y.docNum).Contains(x.docNum))
    .Where(x => !secondList.Select(y => y.docVersion).Contains(x.docVersion))

but I'm afraid it does not take into account the fact that I need to compare document numbers as well.

It seems to me that this should be resolved through GroupBy() or ToDictionary(), but I can't get to the solution.

Please, if possible, show an example of a solution on standard linq elements, because I use NHibernate


Solution

  • You can do more complex comparisons using Any() rather than Select().Contains(). For instance, filtering if the composite key exists in secondList goes like this:

    .Where(x => !secondList.Any(y => y.docNum == x.docNum && y.docVersion == x.docVersion)
    

    This is slightly more efficient in this case than Select().Contains() since there's no intermediate object construction. In some cases where the intermediate object is more complex the savings go up.

    I'd recommend that you use Any() for your first condition as well, giving you something like:

    firstList
    .Where(x => x.Any(y => y.docNum == x.docNum))
    .Where(x => !secondList.Any(y => y.docNum == x.docNum && y.docVersion == x.docVersion)
    

    The first Where() will filter out any items that don't have a matching docNum in the list, the second will filter the remaining items to remove those that have a full composite match on the second list.

    Of course this is terribly inefficient since you have to scan secondList twice - partially, but still two scans. It would be better to do a lookup for version numbers keyed on document number and use that to do much more efficient filtering:

    var lu = secondList.ToLookup(y => y.docNum, y => y.docVersion);
    
    firstList.Where
    (
        x => 
        { 
            var l = lu[x.docNum]; 
            return l.Any() && !l.Any(v => v == x.docVersion);
        }
    )
    

    Since ILookup<> is reasonably well optimized the actual time taken to do the filter is greatly reduced. Using some very simple (mostly-)random list generation with reasonably high collisions I tested filtering 10,000 items against a list of 1,000,000 items to see what the difference would be. 49 seconds for the first option, 1.8 seconds with a lookup.

    That said, this is really only going to work well on IEnumerable<>. If you're operating on an IQueryable<> then stick with Any() and good indices.