Search code examples
c#performancegethashcodevisitor-pattern

Is it a bad idea creating an object in a GetHashCode function?


I have a base class MyBase and about a dozen of derived classes. I rely on the visitor pattern heavily in my code. So the base class is an abstract host for a visitor, each of the derived classes is a concrete host. I am using a standalone comparer that implements the IEqualityComparer<MyBase> interface. There are 2 methods in this interface: Boolean Equals(MyBase a, MyBase b) and Int32 GetHashCode(MyBase obj). In each of these methods I use a visitor in order to resolve an instance of MyBase to the instance of one of the derived types. This is how I avoid dealing with casting. So the visitor is an object that needs to be created on each call to Equals and GetHashCode. I've read a lot about keeping GetHashCode as cheap as possible, so the question would be:

Is creating an object (a visitor) in the GetHashCode and Equals method a bad idea considering the performance?


Solution

  • Implementations of GetHashCode() should be written in such a fashion that every call after the first will be fast. If computation of an object's hash value would require the construction of a new object, that's a pretty good sign that the object should probably cache the hash value after it's been computed once, so future requests for the hash code can be answered quickly. Note that it probably isn't worthwhile to use a flag to indicate whether the hashcode has been computed; instead, decide that zero won't be a valid hash code value, and have a HashCode field which is initialized to zero but gets overwritten with the hash code once it has been computed. If the computation method for the hash code would yield zero, substitute some other arbitrary value instead. Both zero and the other value should only occur about 1/4,000,000,000 of the time, so doubling the probability of that other value to 1/2,000,000,000 should be no big deal.

    If you are caching hash codes, then it shouldn't matter too much if their computation is somewhat expensive, since it will only be done once. Even if a hash code is cached, one should still try to make it reasonably fast, but not worry too much about it.