I am exploring the speed of C++ compared to other languages such as Python. The following code counts to 100 million and prints out the final number (100 million). I'm wondering if the compiler is able to tell that the variable n is used only to be printed, and so it can skip the loop entirely and just print out the final value.
#include <iostream>
int main() {
size_t n = 0;
while (n< 100'000'000) {
n++;
}
std::cout << n << std::endl;
return 0;
}
I thought that perhaps the answer lies in looking at the assembly code, but I'm not really sure how this works.
Will the compiler know to skip this loop?
Depends.
If you are compiling without optimizations enabled, the compiler does whatever you tell him to and does not optimize the loop away. If you compile with optimizations enabled, the compiler can recognize what you are doing and just skip the entire loop.
I have compiled your code with gcc at Compiler Explorer.
Without optimizations (-O0
):
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], 0
jmp .L2
.L3:
add QWORD PTR [rbp-8], 1
.L2:
cmp QWORD PTR [rbp-8], 99999999
jbe .L3
mov rax, QWORD PTR [rbp-8]
mov rsi, rax
mov edi, OFFSET FLAT:std::cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(unsigned long)
mov esi, OFFSET FLAT:std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
mov rdi, rax
call std::basic_ostream<char, std::char_traits<char> >::operator<<(std::basic_ostream<char, std::char_traits<char> >& (*)(std::basic_ostream<char, std::char_traits<char> >&))
mov eax, 0
leave
ret
The instructions from jmp .L2
up to jbe .L3
are the loop you programmed.
With optimizations (-Os
):
main:
push rax
mov esi, 100000000
mov edi, OFFSET FLAT:std::cout
call std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<unsigned long>(unsigned long)
mov rdi, rax
call std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
xor eax, eax
pop rdx
ret
The mov esi, 100000000
stores the value for n
directly without going through the entire loop.
I have chosen the optimization level -Os
here because it should generate the smallest executable, which means less assembly to look at (note that less assembly doesn't necessarily mean faster execution). However the compiler removes the loop already at the lowest optimization level (-O1
).
For an overview about the gcc optimization levels you can take a look at this answer.