I'm compiling C code for a 32-bit processor with gcc. It works fine with -O0
optimization, however with -O1
(also tried -Ofast
) it produces incorrect output.
void foo()
{
volatile unsigned int *reg = (volatile unsigned int *)0x1000;
unsigned int reg_value;
unsigned int busy;
do {
reg_value = *reg;
busy = (reg_value & 0x80000000U);
} while (busy == 0);
}
With -O1
the compiler produces:
1030cea6 <foo>:
1030cea6: a1 00 10 00 00 mov 0x1000,%eax
1030ceab: 85 c0 test %eax,%eax
1030cead: 79 f7 jns 1030cea6 <foo>
1030ceaf: c3 ret
Problem with this output is the 'test %eax,%eax' checks all 32 bits, not just bit 31.
With -O0
the compiler produces:
10312b6d: 55 push %ebp
10312b6e: 89 e5 mov %esp,%ebp
10312b70: 83 ec 10 sub $0x10,%esp
10312b73: c7 45 fc 00 10 00 00 movl $0x1000,-0x4(%ebp)
10312b7a: 8b 45 fc mov -0x4(%ebp),%eax
10312b7d: 8b 00 mov (%eax),%eax
10312b7f: 89 45 f8 mov %eax,-0x8(%ebp)
10312b82: 8b 45 f8 mov -0x8(%ebp),%eax
10312b85: 25 00 00 00 80 and $0x80000000,%eax
10312b8a: 89 45 f4 mov %eax,-0xc(%ebp)
10312b8d: 83 7d f4 00 cmpl $0x0,-0xc(%ebp)
10312b91: 74 e7 je 10312b7a <foo+0xd>
10312b93: 90 nop
10312b94: c9 leave
10312b95: c3 ret
This output looks fine as the and $0x80000000,%eax
restricts the check to just bit 31.
If I change the code to check bit 30 instead of bit 31 (busy = (reg_value & 0x40000000U)
), -O1
produces correct output:
1030cea6: a1 00 10 00 00 mov 0x1000,%eax
1030ceab: a9 00 00 00 40 test $0x40000000,%eax
1030ceb0: 74 f4 je 1030cea6 <foo>
1030ceb2: c3 ret
My guess is this is related to signing, however my variables are all unsigned.
My question is how to produce correct compiler output (which actual restricts the check to bit 31 only) with -O1
?
This is a perfectly correct optimization. test eax, eax
will set SF (the sign flag) to the most significant bit of eax
; jns
will jump if SF = 0, and therefore the function will loop while the MSB of eax
is not set (which is precisely what you wanted).