Search code examples
cassemblygccx86bit

Check to see if bit 31 is set in an unsigned int with gcc -O1 optimization


I'm compiling C code for a 32-bit processor with gcc. It works fine with -O0 optimization, however with -O1 (also tried -Ofast) it produces incorrect output.

void foo()
{
    volatile unsigned int *reg = (volatile unsigned int *)0x1000;
    unsigned int reg_value;
    unsigned int busy;

    do {
        reg_value = *reg;
        busy = (reg_value & 0x80000000U);
    } while (busy == 0);
}

With -O1 the compiler produces:

1030cea6 <foo>:
1030cea6:   a1 00 10 00 00          mov    0x1000,%eax
1030ceab:   85 c0                   test   %eax,%eax
1030cead:   79 f7                   jns    1030cea6 <foo>
1030ceaf:   c3                      ret

Problem with this output is the 'test %eax,%eax' checks all 32 bits, not just bit 31.

With -O0 the compiler produces:

10312b6d:   55                      push   %ebp
10312b6e:   89 e5                   mov    %esp,%ebp
10312b70:   83 ec 10                sub    $0x10,%esp
10312b73:   c7 45 fc 00 10 00 00    movl   $0x1000,-0x4(%ebp)
10312b7a:   8b 45 fc                mov    -0x4(%ebp),%eax
10312b7d:   8b 00                   mov    (%eax),%eax
10312b7f:   89 45 f8                mov    %eax,-0x8(%ebp)
10312b82:   8b 45 f8                mov    -0x8(%ebp),%eax
10312b85:   25 00 00 00 80          and    $0x80000000,%eax
10312b8a:   89 45 f4                mov    %eax,-0xc(%ebp)
10312b8d:   83 7d f4 00             cmpl   $0x0,-0xc(%ebp)
10312b91:   74 e7                   je     10312b7a <foo+0xd>
10312b93:   90                      nop
10312b94:   c9                      leave
10312b95:   c3                      ret

This output looks fine as the and $0x80000000,%eax restricts the check to just bit 31.

If I change the code to check bit 30 instead of bit 31 (busy = (reg_value & 0x40000000U)), -O1 produces correct output:

1030cea6:   a1 00 10 00 00          mov    0x1000,%eax
1030ceab:   a9 00 00 00 40          test   $0x40000000,%eax
1030ceb0:   74 f4                   je     1030cea6 <foo>
1030ceb2:   c3                      ret

My guess is this is related to signing, however my variables are all unsigned.

My question is how to produce correct compiler output (which actual restricts the check to bit 31 only) with -O1?


Solution

  • This is a perfectly correct optimization. test eax, eax will set SF (the sign flag) to the most significant bit of eax; jns will jump if SF = 0, and therefore the function will loop while the MSB of eax is not set (which is precisely what you wanted).