Search code examples
c#linqnullable

Why does Linq join on Nullable<T> not work the same as ==?


var list1 = new List<(string Foo, DayOfWeek? DayOfWeek)> {("Friday", DayOfWeek.Friday), ("Null", null)};
var list2 = new List<(int Num, DayOfWeek? DayOfWeek)> {(1, DayOfWeek.Friday), (2, DayOfWeek.Friday), (3, null)};

Console.WriteLine("Q1:");
var q1 = from l1 in list1
         from l2 in list2
         where l1.DayOfWeek == l2.DayOfWeek
         group l2.Num by l1.Foo
         into g
         select new {g.Key, Sum = g.Sum()};

foreach (var x in q1)
{
    Console.WriteLine($"{x.Key} = {x.Sum}");
}

Console.WriteLine("Q2:");
var q2 = from l1 in list1
         join l2 in list2 on l1.DayOfWeek equals l2.DayOfWeek
         group l2.Num by l1.Foo
         into g
         select new {g.Key, Sum = g.Sum()};

foreach (var x in q2)
{
    Console.WriteLine($"{x.Key} = {x.Sum}");
}

Output:

Q1:
Friday = 3
Null = 3
Q2:
Friday = 3

Note that Q2 does not have a value for "Null"!

What this tells me is that the Linq equals / .Join() does not use an == or .Equals() evaluation. So, what is it using, and is there a sensible reason why one Nullable<T> does not equate to another Nullable<T> when the value is null?


Solution

  • Nullable.Equals(Object) returns true when HasValue property is false, and the other parameter is null. It seems, that the first query is evaluated correctly.

    As for the second query I've changed it to

    var q2 = from l1 in list1
        join l2 in list2 on new { l1.DayOfWeek } equals new { l2.DayOfWeek }
        group l2.Num by l1.Foo
        into g
        select new { g.Key, Sum = g.Sum() };
    

    And got the same result with first query.

    The reason of this is that Enumerable.Join method works as inner join in SQL, according to MSDN

    If an element in the first collection has no matching elements, it does not appear in the result set. The Join method, which is called by the join clause in C#, implements an inner join.

    And null keys are omitted when lookup is created, according to sources.

    internal static Lookup<TKey, TElement> CreateForJoin(IEnumerable<TElement> source, Func<TElement, TKey> keySelector, IEqualityComparer<TKey> comparer) {
                Lookup<TKey, TElement> lookup = new Lookup<TKey, TElement>(comparer);
                foreach (TElement item in source) {
                    TKey key = keySelector(item);
                    if (key != null) lookup.GetGrouping(key, true).Add(item);
                }
                return lookup;
            }
    

    So, there is nothing related to object equality, the null values just skipped during join. I've created a custom EqualityComparer, passed it to Join method and didn't observe any null values coming to Equals method.

    If you try to use outer join or group join, and write something like this

    var q3 = list1.GroupJoin(list2, _ => _.DayOfWeek, _ => _.DayOfWeek, (l1, list) =>
        new
        {
            Name = l1.Foo,
            Num = list.Sum(_ => _.Num)
        }
    );
    

    or this

    var q3 = from l1 in list1
        join l2 in list2 on l1.DayOfWeek equals l2.DayOfWeek into weeks
        from week in weeks.DefaultIfEmpty()
        group week.Num by l1.Foo
        into g
        select new { g.Key, Sum = g.Sum() };
    

    you'll get the following output (correct number of rows, but incorrect sum)

    Q3:
    Friday = 3
    Null = 0
    

    because Num values are lost during outer join and grouping (missed keys are substituted by default values). It seems, that it's only working, when you use where clause without join or use workaround above with an anonymous type.

    When the key is anonymous type, the equality in join is evaluated correctly, because all anonymous object instances aren't null, created during equality and compared by the DayOfWeek? property inside instance (not the instance itself), according this thread