Search code examples
c#linq

What is being returned by this GroupJoin in C#


I am having some trouble understanding GroupJoin in C# and was wondering if someone could clarify.

Say I have two lists of this object

public class Item
{
     public int Id { get; set;}

     public decimal Amount { get; set;}
}

var itemListOne = new List<Item>(){
    new Item{
        Id = 1,
        Amount = 100
    },
    new Item{
        Id = 2,
        Amount = 200
    },
    new Item{
        Id = 3,
        Amount = 300
    },
}

var itemListTwo = new List<Item>(){
    new Item{
        Id = 1,
        Amount = 400
    },
    new Item{
        Id = 2,
        Amount = 500
    }
}

I then am calling a LINQ query like this

var result = (from listOneItem in itemListOne
              join listTwoItem in itemListTwo on listOneItem.Id equals listTwoItem.id into joinedItems
              from joinedItem in joinedItems.DefaultIfEmpty()
              select joinedItem != null ? joinedItem.Amount : 0).ToArray();

What I was looking for with this code was to join the two lists on their Id fields. Once joined I would select the amount value from the second list of items, and the leftover entry in the first list wouldn't get joined so it would return a 0 into the list.

And this code actually works. When I tested it out I could see he first two entries were 400 and 500. And the third was 0 because the first list has that 3rd item which could not be joined onto the second list. But I don't understand why it works.

I was reading into GroupJoins and I thought it mentioned that the returned value would be entries from the outer list. So I expected to have an array of 100, 200, and 300 because the outer list is itemListOne right? But it looks like it's actually giving me the values from the second list.


Solution

  • Your Linq expression...

    var result = (
            from listOneItem in itemListOne
            join listTwoItem in itemListTwo on listOneItem.Id equals listTwoItem.id into joinedItems
            from joinedItem in joinedItems.DefaultIfEmpty()
            select joinedItem != null ? joinedItem.Amount : 0
        )
        .ToArray();
    

    ...is equivalent to this:

    decimal[] result = itemListOne
        .GroupJoin(
            inner: itemListTwo,
            outerKeySelector: i1 => i1.Id,
            innerKeySelector: i2 => i2.Id,
            resultSelector  : ( Item i1, IEnumerable<Item> i2Items ) => new { i1, i2Items }
        )
        .SelectMany(
            collectionSelector: t => t.i2Items.DefaultIfEmpty(),     // `t` is the Anonymous Type above.
            resultSelector    : ( t, i2 ) => i2?.Amount ?? 0M
        )
        .ToArray();
    

    Notice how the i1 objects are passed into GroupJoin's resultSelector which outputs them into a new anonymously-type object, which then get flattened by SelectMany.


    Once joined I would select the Amount from the second list of items, and the leftover entry in the first list wouldn't get joined so it would return a 0 into the list.

    It would, except that you're using .DefaultIfEmpty(), so for the final item the expressio i2Items.DefaultIfEmpty() evaluates to null instead of an empty IEnumerable<Item>.

    So if you remove .DefaultIfEmpty and the ?? bits it still works but outputs only 2 decimal values, not 3.

    decimal[] result = itemListOne
        .GroupJoin(
            inner: itemListTwo,
            outerKeySelector: i1 => i1.Id,
            innerKeySelector: i2 => i2.Id,
            resultSelector  : ( Item i1, IEnumerable<Item> i2Items ) => new { i1, i2Items }
        )
        .SelectMany(
            collectionSelector: t => t.i2Items,     // `t` is the Anonymous Type above.
            resultSelector    : ( t, i2 ) => i2.Amount
        )
        .ToArray();
    

    With this simplification, here's what the intermediate results look like after each Linq step:

    • GroupJoin:

      new[] {
          new
          {
              i1      = Item( id: 1, Amount: 100 ),
              i2Items = new[] { Item( id: 1, Amount: 400 ), }
          },
          new
          {
              i1      = Item( id: 2, Amount: 200 ),
              i2Items = new[] { Item( id: 2, Amount: 500 ), }
          },
          new
          {
              i1      = Item( id: 3, Amount: 300 ),
              i2Items = new[] { /* empty */ }
          },
      }
      
    • SelectMany:

      new[] {
          400M,
          500M
      }
      

    Seeming as none of the i1 objects are used after the GroupJoin matching completes you can simplify it further by omitting i1 from GroupJoin's output:

    decimal[] result = itemListOne
        .GroupJoin(
            inner: itemListTwo,
            outerKeySelector: i1 => i1.Id,
            innerKeySelector: i2 => i2.Id,
            resultSelector  : ( Item i1, IEnumerable<Item> i2Items ) => i2Items
        )
        .SelectMany( i2Items => i2Items )
        .Select( i2 => i2.Amount )
        .ToArray();
    

    ...and move the .Select( i2 => i2.Amount ) up into resultSelector:

    decimal[] result = itemListOne
        .GroupJoin(
            inner: itemListTwo,
            outerKeySelector: i1 => i1.Id,
            innerKeySelector: i2 => i2.Id,
            resultSelector  : ( Item i1, IEnumerable<Item> i2Items ) => i2Items.Select( i2 => i2.Amount )
        )
        .SelectMany( amounts => amounts )
        .ToArray();