Search code examples
c#.netgethashcodeiequalitycomparer

What's the role of GetHashCode in the IEqualityComparer<T> in .NET?


I'm trying to understand the role of the GetHashCode method of the interface IEqualityComparer.

The following example is taken from MSDN:

using System;
using System.Collections.Generic;
class Example {
    static void Main() {
        try {

            BoxEqualityComparer boxEqC = new BoxEqualityComparer();

            Dictionary<Box, String> boxes = new Dictionary<Box,
                                                string>(boxEqC);

            Box redBox = new Box(4, 3, 4);
            Box blueBox = new Box(4, 3, 4);

            boxes.Add(redBox, "red");
            boxes.Add(blueBox, "blue");

            Console.WriteLine(redBox.GetHashCode());
            Console.WriteLine(blueBox.GetHashCode());
        }
        catch (ArgumentException argEx) {

            Console.WriteLine(argEx.Message);
        }
    }
}

public class Box {
    public Box(int h, int l, int w) {
        this.Height = h;
        this.Length = l;
        this.Width = w;
    }
    public int Height { get; set; }
    public int Length { get; set; }
    public int Width { get; set; }
}

class BoxEqualityComparer : IEqualityComparer<Box> {

    public bool Equals(Box b1, Box b2) {
        if (b1.Height == b2.Height & b1.Length == b2.Length
                            & b1.Width == b2.Width) {
            return true;
        }
        else {
            return false;
        }
    }

    public int GetHashCode(Box bx) {
        int hCode = bx.Height ^ bx.Length ^ bx.Width;
        return hCode.GetHashCode();
    }
}

Shouldn't the Equals method implementation be enough to compare two Box objects? That is where we tell the framework the rule used to compare the objects. Why is the GetHashCode needed?

Thanks.

Lucian


Solution

  • A bit of background first...

    Every object in .NET has an Equals method and a GetHashCode method.

    The Equals method is used to compare one object with another object - to see if the two objects are equivalent.

    The GetHashCode method generates a 32-bit integer representation of the object. Since there is no limit to how much information an object can contain, certain hash codes are shared by multiple objects - so the hash code is not necessarily unique.

    A dictionary is a really cool data structure that trades a higher memory footprint in return for (more or less) constant costs for Add/Remove/Get operations. It is a poor choice for iterating over though. Internally, a dictionary contains an array of buckets, where values can be stored. When you add a Key and Value to a dictionary, the GetHashCode method is called on the Key. The hashcode returned is used to determine the index of the bucket in which the Key/Value pair should be stored.

    When you want to access the Value, you pass in the Key again. The GetHashCode method is called on the Key, and the bucket containing the Value is located.

    When an IEqualityComparer is passed into the constructor of a dictionary, the IEqualityComparer.Equals and IEqualityComparer.GetHashCode methods are used instead of the methods on the Key objects.

    Now to explain why both methods are necessary, consider this example:

    BoxEqualityComparer boxEqC = new BoxEqualityComparer(); 
    
    Dictionary<Box, String> boxes = new Dictionary<Box, string>(boxEqC); 
    
    Box redBox = new Box(100, 100, 25);
    Box blueBox = new Box(1000, 1000, 25);
    
    boxes.Add(redBox, "red"); 
    boxes.Add(blueBox, "blue"); 
    

    Using the BoxEqualityComparer.GetHashCode method in your example, both of these boxes have the same hashcode - 100^100^25 = 1000^1000^25 = 25 - even though they are clearly not the same object. The reason that they are the same hashcode in this case is because you are using the ^ (bitwise exclusive-OR) operator so 100^100 cancels out leaving zero, as does 1000^1000. When two different objects have the same key, we call that a collision.

    When we add two Key/Value pairs with the same hashcode to a dictionary, they are both stored in the same bucket. So when we want to retrieve a Value, the GetHashCode method is called on our Key to locate the bucket. Since there is more than one value in the bucket, the dictionary iterates over all of the Key/Value pairs in the bucket calling the Equals method on the Keys to find the correct one.

    In the example that you posted, the two boxes are equivalent, so the Equals method returns true. In this case the dictionary has two identical Keys, so it throws an exception.

    TLDR

    So in summary, the GetHashCode method is used to generate an address where the object is stored. So a dictionary doesn't have to search for it. It just computes the hashcode and jumps to that location. The Equals method is a better test of equality, but cannot be used to map an object into an address space.