Search code examples
c#linqmorelinq

What is the difference between MoreLINQ's DistinctBy and Linq's GroupBy


I have two version of grouping by a list of items

List<m_addtlallowsetup> xlist_distincted = xlist_addtlallowsetups.DistinctBy(p => new { p.setupcode, p.allowcode }).OrderBy(y => y.setupcode).ThenBy(z => z.allowcode).ToList();

and groupby

List <m_addtlallowsetup>  grouped = xlist_addtlallowsetups.GroupBy(p => new { p.setupcode, p.allowcode }).Select(grp => grp.First()).OrderBy(y => y.setupcode).ThenBy(z => z.allowcode).ToList();

these two seemed to me that they are just the same, but there's gotta be a layman's explanation of their difference, their performance and disadvantages


Solution

  • Let's review the MoreLinq APIs first, following is the code for DistinctBy:

    MoreLinq - DistinctBy

    Source Code

    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source,
                Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
            {
                if (source == null) throw new ArgumentNullException(nameof(source));
                if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));
    
                return _(); IEnumerable<TSource> _()
                {
                    var knownKeys = new HashSet<TKey>(comparer);
                    foreach (var element in source)
                    {
                        if (knownKeys.Add(keySelector(element)))
                            yield return element;
                    }
                }
           }
    

    Working

    • Using HashSet<T> internally it just checks the first match and returns the first element of Type T matching the Key, rest are all ignored, since Key is already added to the HashSet
    • Simplest way to get the first element pertaining to every unique Keyin the collection as defined by the Func<TSource, TKey> keySelector
    • Use case is limited (Subset of what GroupBy can achieve, also clear from your code)

    Enumerable - GroupBy

    (Source Code)

    public static IEnumerable<IGrouping<TKey, TElement>> GroupBy<TSource, TKey, TElement>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector) {
                return new GroupedEnumerable<TSource, TKey, TElement>(source, keySelector, elementSelector, null);
            }
    
     internal class GroupedEnumerable<TSource, TKey, TElement> : IEnumerable<IGrouping<TKey, TElement>>
        {
            IEnumerable<TSource> source;
            Func<TSource, TKey> keySelector;
            Func<TSource, TElement> elementSelector;
            IEqualityComparer<TKey> comparer;
     
            public GroupedEnumerable(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector, IEqualityComparer<TKey> comparer) {
                if (source == null) throw Error.ArgumentNull("source");
                if (keySelector == null) throw Error.ArgumentNull("keySelector");
                if (elementSelector == null) throw Error.ArgumentNull("elementSelector");
                this.source = source;
                this.keySelector = keySelector;
                this.elementSelector = elementSelector;
                this.comparer = comparer;
            }
     
            public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator() {
                return Lookup<TKey, TElement>.Create<TSource>(source, keySelector, elementSelector, comparer).GetEnumerator();
            }
     
            IEnumerator IEnumerable.GetEnumerator() {
                return GetEnumerator();
            }
        }
    

    Working

    • As it can be seen, internally use a LookUp data structure to group all the data for a given Key
    • Provides flexibility to element and result selection via projection, thus would be able to meet lot of different use cases

    Summary

    1. MoreLinq - DistinctBy achieves a small subset of what Enumerable - GroupBy can achieve. In case your use case is specific, use the More Linq API
    2. For your use case, speed wise as the scope is limited MoreLinq - DistinctBy would be faster, since unlike Enumerable - GroupBy, DistinctBy doesn't first aggregate all data and then select first for each unique Key, MoreLinq API just ignores data beyond first record
    3. If the requirement is specific use case and no data projection required then MoreLinq is a better choice.

    This is a classic case in Linq, where more than one API can provide same result but we need to be wary of the cost factor, since GroupBy here is designed for much wider task than what you are expecting from DistinctBy