Search code examples
c#linqdictionarysorteddictionary

Fastest way to filter a dictionary and "simplify" its values in C#


In C#, given a SortedDictionary, I need to filter on its keys and then "simplify" its values. This is best explained by the following MWE, which does exactly what I want

static void Main()
{
    var lowerBound = new DateTime(2018, 01, 02);
    var upperBound = new DateTime(2018, 01, 04);

    var myInput = new SortedDictionary<DateTime, SimpleItem>();

    myInput.Add(new DateTime(2018, 01, 01), new SimpleItem { item1 = 1.1, item2 = 2.1 });
    myInput.Add(new DateTime(2018, 01, 02), new SimpleItem { item1 = 1.2, item2 = 2.2 });
    myInput.Add(new DateTime(2018, 01, 03), new SimpleItem { item1 = 1.3, item2 = 2.3 });
    myInput.Add(new DateTime(2018, 01, 04), new SimpleItem { item1 = 1.4, item2 = 2.4 });
    myInput.Add(new DateTime(2018, 01, 05), new SimpleItem { item1 = 1.5, item2 = 2.5 });
    myInput.Add(new DateTime(2018, 01, 06), new SimpleItem { item1 = 1.6, item2 = 2.6 });
    myInput.Add(new DateTime(2018, 01, 07), new SimpleItem { item1 = 1.7, item2 = 2.7 });

    var q = myInput.Where(x => x.Key >= lowerBound && x.Key <= upperBound);

    Dictionary<DateTime, double> d = 
                  q.ToDictionary(x => x.Key, x => x.Value.item1);

    SortedDictionary<DateTime, double> myOutput = 
                  new SortedDictionary<DateTime, double>(d);

    int wait = 0;
}

class SimpleItem
{
    public double item1 { get; set; }
    public double item2 { get; set; }
}

By profiling my actual code (not this MWE), it is quite clear that ToDictionary is extremely slow (all the other parts seem ok). So I m simply asking for another way (hopefully the fastest) to do exactly the same thing.


Solution

  • Your problem is your filtering of the SortedDictionary doesn't take advantage of the fact that it is sorted. Since ICollection (and C# generic collections in general) don't implement any sort of efficient splice operation, lookup is your best bet.

    Turning your filter around, you get:

    var q = Enumerable.Range(0, (Int32)(upperBound - lowerBound).TotalDays+1).Select(n => new { Key = lowerBound.AddDays(n), Item = myInput[lowerBound.AddDays(n)].item1 });
    
    var myOutput = new SortedDictionary<DateTime, double>();
    
    foreach (var pair in q)
        myOutput.Add(pair.Key, pair.Item);
    

    The other methods all average about the same time. Using a very small separation in lowerBound and upperBound results in thousands of times faster performance. Even using a two year span results in hundreds of times faster performance when myInput contains 2 million entries.

    Note that the scope of speedup really depends on how many entries are in the SortedList, a small list will not show much difference in performance.