Search code examples
c#hashsethashcodegethashcode

Get hashcode of a HashSet based on the items present in the HashSet


I have a HashSet<T> and I want to get the hashcode of the HashSet<T> based on the items it contains. I am trying to identify a HashSet<T> based on it's values. I know I can use set operations to check if two HashSet<T>'s are equal. I need a hashcode because I want to use memoization in a function which takes a HashSet<T> as an argument.

HashSet<string> one = new HashSet<string>();
one.Add("java");
Console.WriteLine(one.GetHashCode());

HashSet<string> two = new HashSet<string>();
two.Add("java");
Console.WriteLine(two.GetHashCode() == one.GetHashCode()); 
// I want the hash code to be same for one and two,
// since they both contain the same strings.

Is there a way to identify a HashSet<T> based on it's values without overriding the GetHashCode method? If there isn't one could you please provide the override implementation for GetHashCode of a HashSet<T> of strings?


Solution

  • The HashSet<T> has a static CreateSetComparer method that could solve your problem if it worked correctly, but it doesn't.

    IEqualityComparer<HashSet<string>> comparer = HashSet<string>.CreateSetComparer();
    Console.WriteLine(comparer.GetHashCode(two) == comparer.GetHashCode(one));
    

    This will print true in your example, but if the two hashsets are configured with a non-default comparer, for example StringComparer.OrdinalIgnoreCase, and contain items that are different according to the default comparer, for example "java" and "Java", it will print false.

    The HashSetEqualityComparer<T> class below is aware of the Comparer of each hashset, and produces identical hashcodes for hashsets that contain the same elements, regardless of the order that they were inserted in the hashsets.

    public class HashSetEqualityComparer<T> : IEqualityComparer<HashSet<T>>
    {
        public bool Equals(HashSet<T> x, HashSet<T> y)
        {
            if (ReferenceEquals(x, y)) return true;
            if (x is null || y is null) return false;
            if (!ReferenceEquals(x.Comparer, y.Comparer)) return false;
            return x.SetEquals(y);
        }
    
        public int GetHashCode(HashSet<T> hashSet)
        {
            if (hashSet is null) return 0;
            HashCode hashCode = new();
            foreach (int value in hashSet
                .Select(x => hashSet.Comparer.GetHashCode(x)).Order())
            {
                hashCode.Add(value);
            }
            return hashCode.ToHashCode();
        }
    }
    

    Usage example:

    HashSetEqualityComparer<string> comparer = new();
    Console.WriteLine(comparer.GetHashCode(two) == comparer.GetHashCode(one));
    

    The HashCode is a built-in struct, that is used for combining multiple values into a single hashcode.

    Sorting the hashcodes of the contained elements is a relatively expensive O(n log n) operation. In case paying this performance cost is not desirable, you can get a valid hashcode faster using the XOR technique:

    public int GetHashCode(HashSet<T> hashSet)
    {
        if (hashSet is null) return 0;
        int hashCode = 0;
        foreach (T item in hashSet)
        {
            hashCode = hashCode ^ HashCode.Combine(hashSet.Comparer.GetHashCode(item));
        }
        return hashCode;
    }
    

    The quality of the resulting hashcode might not be as good though. The HashCode.Combine method with a single argument can be used in order to diffuse the hashcode of each element across a large range. Which improves the quality of the resulting hashcode in case, for example, that the elements are small integer numbers.