Search code examples
c#linqperformancelinq-to-objectssql-order-by

OrderBy and Top in LINQ with good performance


What is a good way to get the top 10 records from a very large collection and use a custom OrderBy? If I use the LINQ to Objects OrderBy method it is slow and takes a lot of memory because it creates an entire new collection with the new order. I would like a new method with the signature below that does not re-order the entire collection and is very fast:

public static IEnumerable<TSource> OrderByTop<TSource, TKey>(
    IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IComparer<TKey> comparer,
    int topCount)

I tried to write it but it got very complicated and I thought there might be any easier way using Aggregate or something. Any help would be appreciated.


Solution

  • Aggregate is a good place to start with:

    SortedList<TKey, TSource> resultlist = new SortedList<TKey, TSource>();
    MyBigList.Aggregate(resultlist, (aktlist,entry) => {
       aktlist.Add(entry.Key, entry);
       if (aktlist.Count > 10) aktlist.RemoveAt(10);
       return aktlist;
    });
    

    If you want a different comparer, you can specify one in the constructor of the SortedList.

    EDIT As mentioned by nikie, SortedListcannot contain double values. You can use a standard list together with BinarySearch to achieve the same effect:

    List<TSource> resultlist = new List<TSource>();
    MyBigList.Aggregate(resultlist, (aktlist, entry) => {
       int index = aktlist.BinarySearch(entry);
       if (index < 0) index = ~index;
       if (index < 10) aktlist.Insert(index, entry);
       if (aktlist.Count > 10) aktlist.RemoveAt(10);
       return aktlist;
    });
    

    Again a custom comparer (together with a custom key selection) can be used as parameter to BinarySearch.