Search code examples
c#algorithmdictionaryprocessing-efficiency

Most efficient way to retrieve all element of a Dictionary from a list of keys?


I've a c# Dictionary<DateTime,SomeObject> instance.

I've the following code:

private Dictionary<DateTime, SomeObject> _containedObjects = ...;//Let's imagine you have ~4000 items in it

public IEnumerable<SomeObject> GetItemsList(HashSet<DateTime> requiredTimestamps){
    //How to return the list of SomeObject contained in _containedObjects
    //Knowing that rarely(~<5% of the call), one or several DateTime of "requiredTimestamps" may not be in _containedObjects
}

I'm looking how to return an IEnumerable<SomeObject> containing all element that were referenced by one of the provided keys. The only issue is that this method will be called very often, and we might not always have every given key in parameter.

So is there something more efficient than this:

private Dictionary<DateTime, SomeObject> _containedObjects = ...;//Let's imagine you have ~4000 items in it

public IEnumerable<SomeObject> GetItemsList(HashSet<DateTime> requiredTimestamps){
    List<SomeObject> toReturn = new List<SomeObject>();
    foreach(DateTime dateTime in requiredTimestamps){
        SomeObject found;
        if(_containedObjects.TryGetValue(dateTime, out found)){
            toReturn.Add(found);
        }
    }
    return toReturn;
}

Solution

  • Method 1: To make this significantly faster - this is not by changing the algorithm but by making a local copy of _containedObjects in your method and referencing the local copy for the lookup.

    Example:

    public static IEnumerable<int> GetItemsList3(HashSet<DateTime> requiredTimestamps)
    {
        var tmp = _containedObjects;
    
        List<int> toReturn = new List<int>();
        foreach (DateTime dateTime in requiredTimestamps)
        {
            int found;
    
            if (tmp.TryGetValue(dateTime, out found))
            {
                toReturn.Add(found);
            }
        }
        return toReturn;
    }
    

    Test data and times (on set of 5000 items with 125 keys found):
    Your original method (milliseconds): 2,06032186895335
    Method 1 (milliseconds): 0,53549626223609

    Method 2: One way to make this marginally quicker is to iterate through the smaller set and do the lookup on the bigger set. Depending on the size difference you will gain some speed.

    You are using a Dictionary and HashSet, so your lookup on either of these will be O(1).

    Example: If _containedObjects has less items than requiredTimestamps we loop through _containedObjects (otherwise use your method for the converse)

    public static IEnumerable<int> GetItemsList2(HashSet<DateTime> requiredTimestamps)
    {
        List<int> toReturn = new List<int>();
        foreach (var dateTime in _containedObjects)
        {
            int found;
    
            if (requiredTimestamps.Contains(dateTime.Key))
            {
                toReturn.Add(dateTime.Value);
            }
        }
        return toReturn;
    }
    

    Test data and times (on set of 5000 for _containedObjects and set of 10000 items for requiredTimestamps with 125 keys found):
    Your original method (milliseconds): 3,88056291367086
    Method 2 (milliseconds): 3,31025939438943