I had this piece of code.
#include <stdio.h>
int main() {
int i;
scanf("%d", &i);
printf("%x", i);
}
to which, when i give character 'a' as input, it spits out some random numbers in the output like "73152c" or "66152c" etc.
But when I change the code to this,
#include <stdio.h>
int main() {
int i;
int j = scanf("%d", &i);
printf("%x %d", i, j);
}
output will always be "2 0" for same input.
So, does using the return value of a function changes its behavior?
I'm using windows 10 64-bit with gcc 8.1.0 and compiling with no switches.
Using godbolt.org to examine the assembly code generated by GCC 8.1.0 with no switches, here is the assembly code for the main
routine in your first program:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-4]
mov rsi,rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call __isoc99_scanf
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
mov eax, 0
leave
ret
and here is the code for your second program:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-8]
mov rsi,rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call __isoc99_scanf
mov DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
mov eax, 0
leave
ret
They differ in two places. In the first, this instruction passes the address of i
to scanf
:
lea rax, [rbp-4]
In the second, it is this instruction:
lea rax, [rbp-8]
These are different because, in your second program, the compiler has included space for j
on the stack. For whatever reason, it decided to put j
at rbp-4
, the space used for i
in the first program. This bumped i
to rbp-8
.
Then the code differ where the first program passes i
to printf
:
lea rax, [rbp-8]
and the second passes i
and j
:
mov eax, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
And now we see why your programs print different things for i
. In the first program, because a value is never put into i
(because scanf
makes no assignment for %d
when the input contains the letter “a”), your program prints whatever data happened to be in [rbp-4]
when main
started. In the second program, your program prints whatever happened to be in [rbp-8]
.
What is in these stack locations is whatever is left from the start-up code that runs for main
is called. This is special start-up code that sets up the C environment. It may do things with addresses in your program, and some addresses in your program are deliberately randomized in each execution by the program loader to foil attackers. (For further information, look into address space layout randomization.) It appears when the start-up code is done, it leaves some address in [rbp-4]
and zero in [rbp-8]
. So your first program prints some address for i
and your second program prints zero.
So, the differences in this case were not caused by using or not using the return value of scanf
. They were caused by having more or fewer variables, resulting in changes in where things were put on the stack.
This can of course change if you upgrade your C implementation and a different version of the start-up code is used or the compiler generates different code. Turning on optimization in the compiler, as with the -O3
switch, is likely to change the behavior too.