Search code examples
c#.netclr

Why does the .NET CLR not inline this properly?


I ran into less than ideal inlining behavior of the .NET JIT compiler. The following code is stripped of its context, but it demonstrates the problem:

using System.Runtime.CompilerServices;

namespace HashTest
{
    public class Hasher
    {
        private const int hashSize = sizeof(ulong) * 8;
        public int SmallestMatch;
        public int Offset;
        public void Hash_Inline(ref ulong hash, byte[] data, int curIndex)
        {
            hash = (hash << 1) | (hash >> (hashSize - 1));
            hash ^= data[curIndex];
            if (curIndex < SmallestMatch)
            {
                ulong value = data[curIndex - SmallestMatch];
                value = (value << Offset) | (value >> (hashSize - Offset));
                hash ^= value;
            }
        }
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void RotateLeft(ref ulong hash, int by)
        {
            hash = (hash << by) | (hash >> (hashSize - by));
        }
        public void Hash_FunctionCall(ref ulong hash, byte[] data, int curIndex)
        {
            RotateLeft(ref hash, curIndex);
            hash ^= data[curIndex];
            if (curIndex < SmallestMatch)
            {
                ulong value = data[curIndex - SmallestMatch];
                RotateLeft(ref value, Offset);
                hash ^= value;
            }
        }
    }
}

Here's the assembly code it generates at runtime with a Release build (just the first line of the function, up to the xor on the second line of each function):

net472, inline code:

000007FE8E8E0A20  sub         rsp,28h  
000007FE8E8E0A24  rol         qword ptr [rdx],1  

net472, aggressive inlining:

000007FE8E8E09B0  sub         rsp,28h  
000007FE8E8E09B4  mov         qword ptr [rsp+30h],rcx  
000007FE8E8E09B9  mov         ecx,r9d  
000007FE8E8E09BC  rol         qword ptr [rdx],cl

net5.0, inline code:

000007FE67E56040  push        rbp  
000007FE67E56041  sub         rsp,30h  
000007FE67E56045  lea         rbp,[rsp+30h]  
000007FE67E5604A  xor         eax,eax  
000007FE67E5604C  mov         qword ptr [rbp-8],rax  
000007FE67E56050  mov         qword ptr [rbp+10h],rcx  
000007FE67E56054  mov         qword ptr [rbp+18h],rdx  
000007FE67E56058  mov         qword ptr [rbp+20h],r8  
000007FE67E5605C  mov         dword ptr [rbp+28h],r9d  
000007FE67E56060  mov         rcx,qword ptr [rbp+18h]  
000007FE67E56064  rol         qword ptr [rcx],1  

net5.0, agressive inlining:

000007FE67E55B30  push        rbp  
000007FE67E55B31  sub         rsp,30h  
000007FE67E55B35  lea         rbp,[rsp+30h]  
000007FE67E55B3A  xor         eax,eax  
000007FE67E55B3C  mov         qword ptr [rbp-8],rax  
000007FE67E55B40  mov         qword ptr [rbp+10h],rcx  
000007FE67E55B44  mov         qword ptr [rbp+18h],rdx  
000007FE67E55B48  mov         qword ptr [rbp+20h],r8  
000007FE67E55B4C  mov         dword ptr [rbp+28h],r9d  
000007FE67E55B50  mov         rcx,qword ptr [rbp+10h]  
000007FE67E55B54  mov         rdx,qword ptr [rbp+18h]  
000007FE67E55B58  mov         r8d,dword ptr [rbp+28h]  
000007FE67E55B5C  call        CLRStub[MethodDescPrestub]@7fe67e55558 (07FE67E55558h)  


000007FE67E56010  push        rbp  
000007FE67E56011  mov         rbp,rsp  
000007FE67E56014  mov         qword ptr [rbp+10h],rcx  
000007FE67E56018  mov         qword ptr [rbp+18h],rdx  
000007FE67E5601C  mov         dword ptr [rbp+20h],r8d  
000007FE67E56020  mov         ecx,dword ptr [rbp+20h]  
000007FE67E56023  mov         rax,qword ptr [rbp+18h]  
000007FE67E56027  rol         qword ptr [rax],cl  
000007FE67E5602A  pop         rbp  
000007FE67E5602B  ret  

Why don't they all generate the same code, i.e. the 2 instruction version in the first sample? Is there a way to put "Rotate" in a function instead of having to inline it?

EDIT: I found a bug in the RotateLeft code I posted, it improved the generate assembly quite a bit, but it still has issues.


Solution

  • The functions Hash_Inline and Hash_FunctionCall are not equivalent:

    • The first statement in Hash_Inline rotates by 1, but in Hash_FunctionCall it rotates by curIndex.
    • For RotateLeft you may have probably meant:
    private void RotateLeft(ref ulong hash, int by)
    {
        hash = (hash << by) | (hash >> (hashSize - by));
    }
    

    (NOTE: This question has been edited to fix this issue)

    If you fix these two things, the JIT compiler generates identical native code for both functions on .NET 5 (but not .NET Framework): See the disassembly here.

    Also, if you are using .NET Core 3.0 or later and you want the disassembly of the fully optimized code, you need to call the function a sufficiently large number of times (to trigger a tier 1 compilation) before you get the disassembly, or use MethodImplOptions.AggressiveOptimization.