Search code examples
gccmingwscanfshort

Is this a GCC (mingw) / glibc bug - scanf with shorts?


Look at the following simple piece of code

int main() 
{
   short x = 0, y = 0;
   scanf("%d", &x);
   scanf("%d", &y);
   printf("%d %d\n", x, y);
   return 0;
}

If you input 4 and 5 to this program, you'd expect to get 4 and 5 in the output. With GCC 4.6.2 on windows (mingw), it produces 0 and 5 as the output. So I dug up a bit. This is the assembly code generated

movw    $0, 30(%esp)
movw    $0, 28(%esp)
leal    30(%esp), %eax
movl    %eax, 4(%esp)
movl    $LC0, (%esp)
call    _scanf
leal    28(%esp), %eax
movl    %eax, 4(%esp)
movl    $LC0, (%esp)
call    _scanf

While I haven't done much assembler coding, the above code does not look right. It seems to suggest that x is placed at an offset of 30 bytes of the esp, and y is placed at an offset of 28 bytes of the esp, and then their addresses are passed to scanf. So, when the addresses of x and y are dealt as long ints (4 byte addresses), the following should happen: The first call would set the bytes [30,34) to the value 0x00000004, and the second call would set the bytes [28, 32) to the value 0x00000005. However, since this is a little endian machine, we would have the [0x04 0x00 0x00 0x00] from 30 and then [0x05 0x00 0x00 0x00] from 28. This would cause byte number 30 to get reset to 0.

I tried reversing the order of the scanfs, and it worked (the output did come out as 4 and 5), so that now, the smaller offset was filled first, and then the latter (larger) offset.

It seemed preposterous that GCC could have messed this up. So I tried MSVC, and the assembly it generated had one marked difference. The variables were placed at offsets -4 and -8 (i.e. they were considered as 4 bytes long, though the comment said 2 bytes). Here's part of the code:

_TEXT   SEGMENT
_x$ = -8    ; size = 2
_y$ = -4    ; size = 2
_main   PROC
    push    ebp
    mov ebp, esp
    sub esp, 8
    xor eax, eax
    mov WORD PTR _x$[ebp], ax
    xor ecx, ecx
    mov WORD PTR _y$[ebp], cx
    lea  edx, DWORD PTR _x$[ebp]
    push    edx
    push    OFFSET $SG2470
    call    _scanf
    add esp, 8
    lea eax, DWORD PTR _y$[ebp]
    push    eax
    push    OFFSET $SG2471
    call    _scanf
    add esp, 8

My question is in two parts:

  • I don't have a personal Linux box at my disposal. Is this a GCC issue, or only a mingw issue?

But, more importantly,

  • Is this a bug at all? How would a compiler figure out if it should place "short"s at 2-byte offsets or 4-byte offsets?

Solution

  • To use scanf() on short, you must specify %hd in the format string.

    You're provoking overflows because you are lying to scanf(). Turn on the warnings (-Wall at least). You should get complaints from GCC about mismatches. (While you're learning C, use -Wall to catch the silly mistakes you make. When you've been programming in C for more than a quarter century like I have, you'll add some more flags to make sure you still aren't making silly mistakes. And you'll always make sure that the code compiles clean with -Wall.)

    GCC 4.7.1 on Mac OS X 10.7.5 says:

    ss.c:6:4: warning: format ‘%d’ expects argument of type ‘int *’, but argument 2 has type ‘short int *’ [-Wformat]
    ss.c:7:4: warning: format ‘%d’ expects argument of type ‘int *’, but argument 2 has type ‘short int *’ [-Wformat]