Look at the following simple piece of code
int main()
{
short x = 0, y = 0;
scanf("%d", &x);
scanf("%d", &y);
printf("%d %d\n", x, y);
return 0;
}
If you input 4 and 5 to this program, you'd expect to get 4 and 5 in the output. With GCC 4.6.2 on windows (mingw), it produces 0 and 5 as the output. So I dug up a bit. This is the assembly code generated
movw $0, 30(%esp)
movw $0, 28(%esp)
leal 30(%esp), %eax
movl %eax, 4(%esp)
movl $LC0, (%esp)
call _scanf
leal 28(%esp), %eax
movl %eax, 4(%esp)
movl $LC0, (%esp)
call _scanf
While I haven't done much assembler coding, the above code does not look right. It seems to suggest that x is placed at an offset of 30 bytes of the esp, and y is placed at an offset of 28 bytes of the esp, and then their addresses are passed to scanf. So, when the addresses of x and y are dealt as long ints (4 byte addresses), the following should happen: The first call would set the bytes [30,34) to the value 0x00000004, and the second call would set the bytes [28, 32) to the value 0x00000005. However, since this is a little endian machine, we would have the [0x04 0x00 0x00 0x00] from 30 and then [0x05 0x00 0x00 0x00] from 28. This would cause byte number 30 to get reset to 0.
I tried reversing the order of the scanfs, and it worked (the output did come out as 4 and 5), so that now, the smaller offset was filled first, and then the latter (larger) offset.
It seemed preposterous that GCC could have messed this up. So I tried MSVC, and the assembly it generated had one marked difference. The variables were placed at offsets -4 and -8 (i.e. they were considered as 4 bytes long, though the comment said 2 bytes). Here's part of the code:
_TEXT SEGMENT
_x$ = -8 ; size = 2
_y$ = -4 ; size = 2
_main PROC
push ebp
mov ebp, esp
sub esp, 8
xor eax, eax
mov WORD PTR _x$[ebp], ax
xor ecx, ecx
mov WORD PTR _y$[ebp], cx
lea edx, DWORD PTR _x$[ebp]
push edx
push OFFSET $SG2470
call _scanf
add esp, 8
lea eax, DWORD PTR _y$[ebp]
push eax
push OFFSET $SG2471
call _scanf
add esp, 8
My question is in two parts:
But, more importantly,
To use scanf()
on short
, you must specify %hd
in the format string.
You're provoking overflows because you are lying to scanf()
. Turn on the warnings (-Wall
at least). You should get complaints from GCC about mismatches. (While you're learning C, use -Wall
to catch the silly mistakes you make. When you've been programming in C for more than a quarter century like I have, you'll add some more flags to make sure you still aren't making silly mistakes. And you'll always make sure that the code compiles clean with -Wall
.)
GCC 4.7.1 on Mac OS X 10.7.5 says:
ss.c:6:4: warning: format ‘%d’ expects argument of type ‘int *’, but argument 2 has type ‘short int *’ [-Wformat]
ss.c:7:4: warning: format ‘%d’ expects argument of type ‘int *’, but argument 2 has type ‘short int *’ [-Wformat]