I ran into less than ideal inlining behavior of the .NET JIT compiler. The following code is stripped of its context, but it demonstrates the problem:
using System.Runtime.CompilerServices;
namespace HashTest
{
public class Hasher
{
private const int hashSize = sizeof(ulong) * 8;
public int SmallestMatch;
public int Offset;
public void Hash_Inline(ref ulong hash, byte[] data, int curIndex)
{
hash = (hash << 1) | (hash >> (hashSize - 1));
hash ^= data[curIndex];
if (curIndex < SmallestMatch)
{
ulong value = data[curIndex - SmallestMatch];
value = (value << Offset) | (value >> (hashSize - Offset));
hash ^= value;
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private void RotateLeft(ref ulong hash, int by)
{
hash = (hash << by) | (hash >> (hashSize - by));
}
public void Hash_FunctionCall(ref ulong hash, byte[] data, int curIndex)
{
RotateLeft(ref hash, curIndex);
hash ^= data[curIndex];
if (curIndex < SmallestMatch)
{
ulong value = data[curIndex - SmallestMatch];
RotateLeft(ref value, Offset);
hash ^= value;
}
}
}
}
Here's the assembly code it generates at runtime with a Release build (just the first line of the function, up to the xor on the second line of each function):
net472, inline code:
000007FE8E8E0A20 sub rsp,28h
000007FE8E8E0A24 rol qword ptr [rdx],1
net472, aggressive inlining:
000007FE8E8E09B0 sub rsp,28h
000007FE8E8E09B4 mov qword ptr [rsp+30h],rcx
000007FE8E8E09B9 mov ecx,r9d
000007FE8E8E09BC rol qword ptr [rdx],cl
net5.0, inline code:
000007FE67E56040 push rbp
000007FE67E56041 sub rsp,30h
000007FE67E56045 lea rbp,[rsp+30h]
000007FE67E5604A xor eax,eax
000007FE67E5604C mov qword ptr [rbp-8],rax
000007FE67E56050 mov qword ptr [rbp+10h],rcx
000007FE67E56054 mov qword ptr [rbp+18h],rdx
000007FE67E56058 mov qword ptr [rbp+20h],r8
000007FE67E5605C mov dword ptr [rbp+28h],r9d
000007FE67E56060 mov rcx,qword ptr [rbp+18h]
000007FE67E56064 rol qword ptr [rcx],1
net5.0, agressive inlining:
000007FE67E55B30 push rbp
000007FE67E55B31 sub rsp,30h
000007FE67E55B35 lea rbp,[rsp+30h]
000007FE67E55B3A xor eax,eax
000007FE67E55B3C mov qword ptr [rbp-8],rax
000007FE67E55B40 mov qword ptr [rbp+10h],rcx
000007FE67E55B44 mov qword ptr [rbp+18h],rdx
000007FE67E55B48 mov qword ptr [rbp+20h],r8
000007FE67E55B4C mov dword ptr [rbp+28h],r9d
000007FE67E55B50 mov rcx,qword ptr [rbp+10h]
000007FE67E55B54 mov rdx,qword ptr [rbp+18h]
000007FE67E55B58 mov r8d,dword ptr [rbp+28h]
000007FE67E55B5C call CLRStub[MethodDescPrestub]@7fe67e55558 (07FE67E55558h)
000007FE67E56010 push rbp
000007FE67E56011 mov rbp,rsp
000007FE67E56014 mov qword ptr [rbp+10h],rcx
000007FE67E56018 mov qword ptr [rbp+18h],rdx
000007FE67E5601C mov dword ptr [rbp+20h],r8d
000007FE67E56020 mov ecx,dword ptr [rbp+20h]
000007FE67E56023 mov rax,qword ptr [rbp+18h]
000007FE67E56027 rol qword ptr [rax],cl
000007FE67E5602A pop rbp
000007FE67E5602B ret
Why don't they all generate the same code, i.e. the 2 instruction version in the first sample? Is there a way to put "Rotate" in a function instead of having to inline it?
EDIT: I found a bug in the RotateLeft code I posted, it improved the generate assembly quite a bit, but it still has issues.
The functions Hash_Inline
and Hash_FunctionCall
are not equivalent:
Hash_Inline
rotates by 1, but in Hash_FunctionCall
it rotates by curIndex
.RotateLeft
you may have probably meant:private void RotateLeft(ref ulong hash, int by)
{
hash = (hash << by) | (hash >> (hashSize - by));
}
(NOTE: This question has been edited to fix this issue)
If you fix these two things, the JIT compiler generates identical native code for both functions on .NET 5 (but not .NET Framework): See the disassembly here.
Also, if you are using .NET Core 3.0 or later and you want the disassembly of the fully optimized code, you need to call the function a sufficiently large number of times (to trigger a tier 1 compilation) before you get the disassembly, or use MethodImplOptions.AggressiveOptimization
.