Search code examples
c#stringdictionaryequalsgethashcode

Slow dictionary with custom class key


I have a custom class that I was trying to use as a key for a dictionary:

// I tried setting more than enough capacity also...
var dict = new Dictionary<MyPoint, MyPoint>(capacity);

Now let me be clear, the goal here is to compare two SIMILAR but DIFFERENT lists, using X, Y, and Date as a composite key. The values will vary between these two lists, and I'm trying to quickly compare them and compute their differences.

Here is the class code:

public class MyPoint : IEquatable<MyPoint>
{
    public short X { get; set; }
    public short Y { get; set; }
    public DateTime Date { get; set; }
    public double MyValue { get; set; }

    public override bool Equals(object obj)
    {
        return base.Equals(obj as MyPoint);
    }

    public bool Equals(MyPoint other)
    {
        if (other == null)
        {
            return false;
        }

        return (Date == other.Date)
            && (X == other.X)
            && (Y == other.Y);
    }

    public override int GetHashCode()
    {
        return Date.GetHashCode()
             | X.GetHashCode()
             | Y.GetHashCode();
    }
}

I also tried keying with a struct:

public struct MyPointKey
{
    public short X;
    public short Y;
    public DateTime Date;
    // The value is not on these, because the struct is only used as key
}

In both cases dictionary writing was very, very slow (reading was quick).

I changed the key to a string, with the format:

var dict = new Dictionary<string, MyPoint>(capacity);
var key = string.Format("{0}_{1}", item.X, item.Y);

I was amazed at how much quicker this is -- it's at least 10 times faster. I tried Release mode, no debugger, and every scenario I could think of.

This dictionary will contain 350,000 or more items, so performance does matter.

Any thoughts or suggestions? Thanks!

Another edit...

I'm trying to compare two lists of things in the fastest way I can. This is what I'm working with. The Dictionary is important for fast lookups against the source list.

IList<MyThing> sourceList;
IDictionary<MyThing, MyThing> comparisonDict;

Parallel.ForEach(sourceList,
    sourceItem =>
    {
        double compareValue = 0;

        MyThing compareMatch = null;
        if (comparisonDict.TryGetValue(sourceItem, out compareMatch))
        {
            compareValue = compareMatch.MyValue;
        }

        // Do a delta check on the item
        double difference = sourceItem.MyValue- compareValue;
        if (Math.Abs(difference) > 1)
        {
            // Record the difference...
        }
    });

Solution

  • As others have said in the comments, the problem is in your GetHashCode() implementation. Taking your code, and running 10,000,000 iterations with the string key took 11-12 seconds. Running with your existing hashCode I stopped it after over a minute. Using the following hashCode implementation took under 5 seconds.

    public override int GetHashCode()
    {
        var hashCode = Date.GetHashCode();
        hashCode = (hashCode * 37) ^ X.GetHashCode();
        hashCode = (hashCode * 37) ^ Y.GetHashCode();
        return hashCode;
    }
    

    The problem is that when you get into large numbers, the items are all colliding in the same buckets, due to the ORs. A dictionary where everything is in the same bucket is just a list.