What is causing this implementation of GetHashCode to be 20 times slower than .net's implementation?

I got the idea of a Substring struct from this post and this one. The second post has the implementation of .net's String.GetHashCode(). (I'm not sure which version of .net this is from.)

Here is the implementation. (GetHashCode is taken from the second source listed above.)

public struct Substring
{
    private string String;
    private int Offset;
    public int Length { get; private set; }
    public char this[int index] { get { return String[Offset + index]; } }

    public Substring(string str, int offset, int len) : this()
    {
        String = str;
        Offset = offset;
        Length = len;
    }

    /// <summary>
    /// See http://www.dotnetperls.com/gethashcode
    /// </summary>
    /// <returns></returns>
    public unsafe override int GetHashCode()
    {
        fixed (char* str = String + Offset)
        {
            char* chPtr = str;
            int num = 352654597;
            int num2 = num;
            int* numPtr = (int*)chPtr;
            for (int i = Length; i > 0; i -= 4)
            {
                num = (((num << 5) + num) + (num >> 27)) ^ numPtr[0];
                if (i <= 2)
                {
                    break;
                }
                num2 = (((num2 << 5) + num2) + (num2 >> 27)) ^ numPtr[1];
                numPtr += 2;
            }
            return (num + (num2 * 1566083941));
        }
    }
}

Here's a unit test:

    [Test]
    public void GetHashCode_IsAsFastAsString()
    {
        var s = "The quick brown fox";
        var sub = new Substring(s, 1, 5);
        var t = "quick";
        var sum = 0;

        sum += sub.GetHashCode(); // make sure GetHashCode is jitted 

        var count = 100000000;
        var sw = Stopwatch.StartNew();
        for (var i = 0; i < count; ++i)
            sum += t.GetHashCode();
        var t1 = sw.Elapsed;
        sw = Stopwatch.StartNew();
        for (var i = 0; i < count; ++i)
            sum += sub.GetHashCode();
        var t2 = sw.Elapsed;

        Debug.WriteLine(sum.ToString()); // make sure we use the return value
        var m1 = t1.Milliseconds;
        var m2 = t2.Milliseconds;
        Assert.IsTrue(m2 <= m1); // fat chance
    }

The problem is that m1 is 10 milliseconds and m2 is 190 milliseconds. (Note: this is with 1000000 iterations.) FYI, I ran this on .net 4.5 64 bit Release build with Optimizations turned on.

Solution

Clued by a comment, I double checked to make sure that optimized code was running. It turns out that an obscure Debugger setting was disabling optimizations. So I unchecked Tools – Options – Debugging – General – Suppress JIT optimization on module load (Managed only). This caused optimized code to load properly.
Even with optimizations turned on there is still a about a 3x - 6x difference. However, this might be attributable to the fact that the code above is the .net 32 bit version and I'm running 64 bit .net. Porting the 64 bit implementation of string.GetHashCode to Substring is not as easy because it relies on a zero end of string marker (which is in fact a bug).

At this time I'm disappointed about not getting parity performance, but this was an excellent use of my time in learning about some of the perils and pitfalls of optimizing C#.