Search code examples
c#assemblyvisual-studio-2012disassembly

Disassembly view of C# 64-bit Release code is 75% longer than 32-bit Debug code?


EDIT

I tested release in 32 bit, and the code was compact. Therefore the below is a 64 bit issue.


I'm using VS 2012 RC. Debug is 32 bit, and Release is 64 bit. Below is the debug then release disassembly of a line of code:

         crc = (crc >> 8) ^ crcTable[((val & 0x0000ff00) >> 8) ^ crc & 0xff];
0000006f  mov         eax,dword ptr [ebp-40h] 
00000072  shr         eax,8 
00000075  mov         edx,dword ptr [ebp-3Ch] 
00000078  mov         ecx,0FF00h 
0000007d  and         edx,ecx 
0000007f  shr         edx,8 
00000082  mov         ecx,dword ptr [ebp-40h] 
00000085  mov         ebx,0FFh 
0000008a  and         ecx,ebx 
0000008c  xor         edx,ecx 
0000008e  mov         ecx,dword ptr ds:[03387F38h] 
00000094  cmp         edx,dword ptr [ecx+4] 
00000097  jb          0000009E 
00000099  call        6F54F5EC 
0000009e  xor         eax,dword ptr [ecx+edx*4+8] 
000000a2  mov         dword ptr [ebp-40h],eax
-----------------------------------------------------------------------------
         crc = (crc >> 8) ^ crcTable[((val & 0x0000ff00) >> 8) ^ crc & 0xff];
000000a5  mov         eax,dword ptr [rsp+20h] 
000000a9  shr         eax,8 
000000ac  mov         dword ptr [rsp+38h],eax 
000000b0  mov         rdx,124DEE68h 
000000ba  mov         rdx,qword ptr [rdx] 
000000bd  mov         eax,dword ptr [rsp+00000090h] 
000000c4  and         eax,0FF00h 
000000c9  shr         eax,8 
000000cc  mov         ecx,dword ptr [rsp+20h] 
000000d0  and         ecx,0FFh 
000000d6  xor         eax,ecx 
000000d8  mov         ecx,eax 
000000da  mov         qword ptr [rsp+40h],rdx 
000000df  mov         rax,qword ptr [rsp+40h] 
000000e4  mov         rax,qword ptr [rax+8] 
000000e8  mov         qword ptr [rsp+48h],rcx 
000000ed  cmp         qword ptr [rsp+48h],rax 
000000f2  jae         0000000000000100 
000000f4  mov         rax,qword ptr [rsp+48h] 
000000f9  mov         qword ptr [rsp+48h],rax 
000000fe  jmp         0000000000000105 
00000100  call        000000005FA5D364 
00000105  mov         rax,qword ptr [rsp+40h] 
0000010a  mov         rcx,qword ptr [rsp+48h] 
0000010f  mov         ecx,dword ptr [rax+rcx*4+10h] 
00000113  mov         eax,dword ptr [rsp+38h] 
00000117  xor         eax,ecx 
00000119  mov         dword ptr [rsp+20h],eax

What is all the extra code in the 64 bit version doing? It is testing for what? I haven't benchmarked this, but the 32 bit code should execute much faster.

EDIT

The whole function:

public static uint CRC32(uint val)
{
    uint crc = 0xffffffff;

    crc = (crc >> 8) ^ crcTable[(val & 0x000000ff) ^ crc & 0xff];
    crc = (crc >> 8) ^ crcTable[((val & 0x0000ff00) >> 8) ^ crc & 0xff];
    crc = (crc >> 8) ^ crcTable[((val & 0x00ff0000) >> 16) ^ crc & 0xff];
    crc = (crc >> 8) ^ crcTable[(val >> 24) ^ crc & 0xff];

    // flip bits
    return (crc ^ 0xffffffff);
}

Solution

  • I suspect you are using "Go to disassembly" while debugging the release build to get the assembly code.

    After going to Tools -> Options, Debugging, General, and disabling "Suppress JIT optimization on module load" I got an x64 assembly listing without error checking.

    It seems like by default even in release mode the code is not optimized if the debugger attached. Keep that in mind when trying to benchmark your code.

    PS: Benchmarking shows x64 slightly faster than x86, 4.3 vs 4.8 seconds for 1 billion function calls.

    Edit: Break points still worked for me, otherwise I wouldn't have been able to see the disassembly after unchecking. Your example line from above looks like this (VS 2012 RC):

    crc = (crc >> 8) ^ crcTable[((val & 0x0000ff00) >> 8) ^ crc & 0xff];
    00000030  mov         r11d,eax 
    00000033  shr         r11d,8 
    00000037  mov         ecx,edx 
    00000039  and         ecx,0FF00h 
    0000003f  shr         ecx,8 
    00000042  movzx       eax,al 
    00000045  xor         ecx,eax 
    00000047  mov         eax,ecx 
    00000049  cmp         rax,r9 
    0000004c  jae         00000000000000A4 
    0000004e  mov         eax,dword ptr [r8+rax*4+10h] 
    00000053  xor         r11d,eax