Search code examples
c#.netlinqmorelinq

Performance difference between GroupBy and MoreLinq's DistinctBy


Reading this question (and answer) I found out that there are at least two ways of get distinct items off an IQueryabe while still getting to choose what to filter by. Those two methods being:

table.GroupBy(x => x.field).Select(x => x.FirstOrDefault());

or using MoreLinqs DistinctBy

table.DistinctBy(x => x.field);

But that thread doesn't explain the performance difference and when I should use the one over the other. So when do I want to use one over the other?


Solution

  • There is a very big difference in what they do and thus the performance difference is expected. GroupBy will create a collection for each key in the original collection before passing it to the Select. DistinctBy needs to only keep a hashset with weather it has encountered the key before, so it can be much faster.

    If DistinctBy is enough for you always use it, only use GroupBy if you need the elements in each group.

    Also for LINQ to EF for example the DistinctBy operator will not work.