When using Compiler Explorer (https://godbolt.org/) to compare assembly output of simple programs, why D language assembly output is so long compared to C or C++ output. The simple square function output is the same for C, C++, and D, but the D output has additional lines that are not highlighted when hovering over the square function in the source code.
Let's say I have https://godbolt.org/z/64EsWo5Ke a template function both in C++ and D, the Intel asm output for D is 29309 lines long, while the C++ Intel asm output is 73 lines only.
These are the codes in question: For D:
int example.square(int):
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], edi
mov eax, dword ptr [rbp - 4]
imul eax, dword ptr [rbp - 4]
pop rbp
ret
ldc.register_dso:
sub rsp, 40
mov qword ptr [rsp + 8], 1
lea rax, [rip + ldc.dso_slot]
mov qword ptr [rsp + 16], rax
lea rax, [rip + __start___minfo]
mov qword ptr [rsp + 24], rax
lea rax, [rip + __stop___minfo]
mov qword ptr [rsp + 32], rax
lea rax, [rsp + 8]
mov rdi, rax
call _d_dso_registry@PLT
add rsp, 40
ret
example.__ModuleInfo:
.long 2147483652
.long 0
.asciz "example"
example.__moduleRef:
.quad example.__ModuleInfo
ldc.dso_slot:
.quad 0
C/C++:
square(int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov eax, DWORD PTR [rbp-4]
imul eax, eax
pop rbp
ret
As you can see the actual implementation in assembly is very similar (almost identical). The program constructs the stack frame:
push rbp
mov rbp, rsp
Takes the argument and multiplies it with itself leaving it in the return value (eax
register):
mov dword ptr [rbp - 4], edi
mov eax, dword ptr [rbp - 4]
imul eax, dword ptr [rbp - 4]
in D and
mov DWORD PTR [rbp-4], edi
mov eax, DWORD PTR [rbp-4]
imul eax, eax
in C++/C, and then deconstructs stack frame and returns:
pop rbp
ret
Now I don't claim to know what the D compiler is doing, but I assume the rest of the code is so that this piece of compiled code can work well with other D code. Basically metadata and other fun stuff. I assume this because nowhere does our function use any of the defined symbols nor do the other function call square. This code is therefore probably to do something with inclusion into other D programs, or the like, and therefore you might not be able to/should not remove it.
In the case of your second example, most of the code is the output library implemented. Using only the function defined it is actually 66 lines long. While still longer than the equivalent 22 lines of C++ generated assembly it is not several thousand.
Edit:
As I explained in a comment would recommend to analyse the output binaries with something like Cutter or Ghidra, which give you a more complete picture of what is actually produced in a binary, because I can tell you that even in 'shorter' C++ code you will find a lot of function calls such as _entry
before getting to main.