Search code examples
c++assemblyx86dcompiler-explorer

Compiler Explorer Assembly Output for C, C++ and D (dlang)


When using Compiler Explorer (https://godbolt.org/) to compare assembly output of simple programs, why D language assembly output is so long compared to C or C++ output. The simple square function output is the same for C, C++, and D, but the D output has additional lines that are not highlighted when hovering over the square function in the source code.

  • What are these additional lines?
  • How I can remove these lines from being generated?

Let's say I have https://godbolt.org/z/64EsWo5Ke a template function both in C++ and D, the Intel asm output for D is 29309 lines long, while the C++ Intel asm output is 73 lines only.


Solution

  • These are the codes in question: For D:

    int example.square(int):
            push    rbp
            mov     rbp, rsp
            mov     dword ptr [rbp - 4], edi
            mov     eax, dword ptr [rbp - 4]
            imul    eax, dword ptr [rbp - 4]
            pop     rbp
            ret
    
    ldc.register_dso:
            sub     rsp, 40
            mov     qword ptr [rsp + 8], 1
            lea     rax, [rip + ldc.dso_slot]
            mov     qword ptr [rsp + 16], rax
            lea     rax, [rip + __start___minfo]
            mov     qword ptr [rsp + 24], rax
            lea     rax, [rip + __stop___minfo]
            mov     qword ptr [rsp + 32], rax
            lea     rax, [rsp + 8]
            mov     rdi, rax
            call    _d_dso_registry@PLT
            add     rsp, 40
            ret
    
    example.__ModuleInfo:
            .long   2147483652
            .long   0
            .asciz  "example"
    
    example.__moduleRef:
            .quad   example.__ModuleInfo
    
    ldc.dso_slot:
            .quad   0
    

    C/C++:

    square(int):
            push    rbp
            mov     rbp, rsp
            mov     DWORD PTR [rbp-4], edi
            mov     eax, DWORD PTR [rbp-4]
            imul    eax, eax
            pop     rbp
            ret
    

    As you can see the actual implementation in assembly is very similar (almost identical). The program constructs the stack frame:

            push    rbp
            mov     rbp, rsp
    

    Takes the argument and multiplies it with itself leaving it in the return value (eax register):

            mov     dword ptr [rbp - 4], edi
            mov     eax, dword ptr [rbp - 4]
            imul    eax, dword ptr [rbp - 4]
    

    in D and

            mov     DWORD PTR [rbp-4], edi
            mov     eax, DWORD PTR [rbp-4]
            imul    eax, eax
    

    in C++/C, and then deconstructs stack frame and returns:

            pop     rbp
            ret
    

    Now I don't claim to know what the D compiler is doing, but I assume the rest of the code is so that this piece of compiled code can work well with other D code. Basically metadata and other fun stuff. I assume this because nowhere does our function use any of the defined symbols nor do the other function call square. This code is therefore probably to do something with inclusion into other D programs, or the like, and therefore you might not be able to/should not remove it.

    In the case of your second example, most of the code is the output library implemented. Using only the function defined it is actually 66 lines long. While still longer than the equivalent 22 lines of C++ generated assembly it is not several thousand.

    Edit:

    As I explained in a comment would recommend to analyse the output binaries with something like Cutter or Ghidra, which give you a more complete picture of what is actually produced in a binary, because I can tell you that even in 'shorter' C++ code you will find a lot of function calls such as _entry before getting to main.