Search code examples
c#objectgethashcode

Implementation of Object.GetHashCode()


I'm reading Effective C# and there is a comment about Object.GetHashCode() that I didn't understand:

Object.GetHashCode() uses an internal field in the System.Object class to generate the hash value. Each object created is assigned a unique object key, stored as an integer, when it is created.
These keys start at 1 and increment every time a new object of any type gets created. The object identity field is set in the System.Object constructor and cannot be modified later. Object.GetHashCode() returns this value as the hash code for a given object.

I tried to look at the documentation of Object.GetHashCode() and didn't find any information about this.

I wrote the simple piece of code to print the hash code of newly generated objects:

using System;

namespace TestGetHashCode
{
    class Program
    {
        static void Main(string[] args)
        {
            for (int i = 0; i < 100; i++)
            {
                object o = new object();
                Console.WriteLine(o.GetHashCode());
            }
        }
    }
}

The first few numbers that were printed were:

37121646,
45592480,
57352375,
2637164,
41014879,
3888474,
25209742,
26966483,
31884011

Which didn't seem to fit that

These keys start at 1 and increment every time a new object of any type gets created...Object.GetHashCode() returns this value

Then, in order to find this "internal field in the System.Object" I tried using ReSharper decompiled sources but the code I found was

[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
[__DynamicallyInvokable]
public virtual int GetHashCode()
{
  return RuntimeHelpers.GetHashCode(this);
}

and again using decompiled sources I found that RuntimeHelpers.GetHashCode was implemented as

[SecuritySafeCritical]
[__DynamicallyInvokable]
[MethodImpl(MethodImplOptions.InternalCall)]
public static int GetHashCode(object o);

following the MethodImpl attribute it seems that I can't view the implementation and this is a dead end for me.

Can someone please explain the comment by the author (the first quote) ?

What is the internal field within the Object class and how it is used for the implementation of the Object.GetHashCode()?


Solution

  • Okay, I'd better write this up. The book is very inaccurate. The value for Object.GetHashCode() is generated inside the CLR and is calculated on demand, whenever GetHashCode() is called the first time. I'll quote the code from the SSCLI20 distribution, clr/src/vm/thread.h has the function that produces the number, it looks like this (edited for readability):

    inline DWORD GetNewHashCode()
    {
        // Every thread has its own generator for hash codes so that we won't get into a 
        // situation where two threads consistently give out the same hash codes.
        // Choice of multiplier guarantees period of 2**32
        // see Knuth Vol 2 p16 (3.2.1.2 Theorem A).
        DWORD multiplier = m_ThreadId*4 + 5;
        m_dwHashCodeSeed = m_dwHashCodeSeed*multiplier + 1;
        return m_dwHashCodeSeed;
    }
    

    After which it is stored in the so-called sync block of the object so subsequent calls return the same value. Only 26 of the generated 32 bits are actually stored, the sync block needs space for some status bits. Still plenty good enough to generate a very high quality hash code, collisions are quite rare.

    The presence of the m_ThreadId variable in that code can use an explanation. The random number generator seed is stored for each individual thread. A trick to avoid having to take a lock.

    The m_dwHashCodeSeed is initialized in the Thread constructor like this:

       // Initialize this variable to a very different start value for each thread
       // Using linear congruential generator from Knuth Vol. 2, p. 102, line 24
       dwHashCodeSeed = dwHashCodeSeed * 1566083941 + 1;
       m_dwHashCodeSeed = dwHashCodeSeed;
    

    with:

       static  DWORD dwHashCodeSeed = 123456789;