Search code examples
cgcccompiler-optimizationcompiler-bug

Am I seeing an optimisation bug when iterating through a list in C?


I have been working on integrating a colleague's software library into our larger application. He has been writing and testing his library under -O0 on gcc 4.9.3. This is embedded software for an alarm system. This bug is observed under -Os optimisation and we are using straight C (none of that C++ nonsense), also on gcc 4.9.3. The architecture is ARM Cortex-M4F.

I am experiencing an issue with integrating that code into the larger stack. The below code should iterate through at most GLOBAL_MAX_DEVICES until it finds a free space in the table to insert its entry:

RET_VALUES DEVICE_Add_To_New_Table( uint8_t *new_id, uint16_t unique_id )
{
    RET_VALUES ret_value = RET_OK;
    uint8_t pos = 0;

[...]
    else
    {
        debug_print( "A,", true, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );

        // See if the unique_id is already in the table (relearn)
        while(( unique_id != DEVICE_new_table.unique_id[pos] ) && ( pos < GLOBAL_MAX_DEVICES ))
        {
            debug_print( "B,", false, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
            pos++;
        }
[...]

The problem we are experiencing is that instead of the loop iterating GLOBAL_MAX_DEVICES times (currently 13) it iterates over 200 times. The sequence "B," is printed out multiple times in the while(( unique_id != DEVICE_new_table.unique_id[pos] ) && ( pos < GLOBAL_MAX_DEVICES )) loop.

Three changes can be made which resolve this issue;

  • pos can be made a volatile, which (I believe) inhibits some optimisation on this value.

  • The parameters in the while block can be reversed, forcing evaluation of pos < GLOBAL_MAX_DEVICES to occur first.

  • It is possible to add a build flag, -fno-aggressive-loop-optimizations, to turn off some loop optimisations.

Only one given fix is required, it appears, to resolve the issue.

I initially thought the issue may be down to a short-circuit optimisation. But, I can confirm that the unique_id parameter is not zero (it is 0x042b on this sample), and DEVICE_new_table is initialised to all zeroes in the unique_id field of the struct. And, in any case, even if the first argument is true, the latter should still be evaluated; short-circuiting should only apply if the first is false, which is clearly not the case, at least not initially, because the loop is iterating for some time.

I am therefore confused as to why this loop iterates hundreds of times (I presume eventually a collision with a memory address occurs, which causes a chance match against the unique_id, possibly with a value in the stack.) Adding a print statement into the loop that prints the iterator pos causes the loop to run over 13,000 times. 13,000 is much bigger than 13, I have checked several times.

Comparing the machine code produced by the broken sample (with additional printf statement added) and working sample (also with printf) shows that the working variant, with the arguments in the boolean operator reversed, includes a comparison with the iteration maximum of 13.

Broken:

        debug_print( "A,", true, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
    ea7a:   4822        ldr r0, [pc, #136]  ; (eb04 <DEVICE_Add_To_New_Table+0xb0>)
    ea7c:   f8df 80a4   ldr.w   r8, [pc, #164]  ; eb24 <DEVICE_Add_To_New_Table+0xd0>
    ea80:   2101        movs    r1, #1
    ea82:   2200        movs    r2, #0
    ea84:   2306        movs    r3, #6
    ea86:   f00b fef9   bl  1a87c <debug_print>

        // See if the unique_id is already in the table (relearn)
        while(( unique_id != DEVICE_new_table.unique_id[pos] ) && ( pos < GLOBAL_MAX_DEVICES ))
    ea8a:   2400        movs    r4, #0
    ea8c:   f838 3f02   ldrh.w  r3, [r8, #2]!
    ea90:   454b        cmp r3, r9
    ea92:   b2e7        uxtb    r7, r4
    ea94:   4626        mov r6, r4
    ea96:   f104 0401   add.w   r4, r4, #1
    ea9a:   d00c        beq.n   eab6 <DEVICE_Add_To_New_Table+0x62>
        {
            uart_printf(-1, "%d,", pos);
    ea9c:   4632        mov r2, r6
    ea9e:   491a        ldr r1, [pc, #104]  ; (eb08 <DEVICE_Add_To_New_Table+0xb4>)
    eaa0:   f04f 30ff   mov.w   r0, #4294967295
    eaa4:   f7f8 ff6c   bl  7980 <uart_printf>
            debug_print( "B,", false, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
    eaa8:   2100        movs    r1, #0
    eaaa:   4818        ldr r0, [pc, #96]   ; (eb0c <DEVICE_Add_To_New_Table+0xb8>)
    eaac:   460a        mov r2, r1
    eaae:   2306        movs    r3, #6
    eab0:   f00b fee4   bl  1a87c <debug_print>
    eab4:   e7ea        b.n ea8c <DEVICE_Add_To_New_Table+0x38>
            pos++;
        }

Working (trimmed unnecessary parts), note the cmp r4, #13 statement at eab2 which is absent in the broken sample:

        debug_print( "A,", true, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
    ea7a:   4845        ldr r0, [pc, #276]  ; (eb90 <DEVICE_Add_To_New_Table+0x13c>)
    ea7c:   f8df 8148   ldr.w   r8, [pc, #328]  ; ebc8 <DEVICE_Add_To_New_Table+0x174>
    ea80:   2101        movs    r1, #1
    ea82:   2200        movs    r2, #0
    ea84:   2306        movs    r3, #6
    ea86:   f00b ff4d   bl  1a924 <debug_print>
    ea8a:   4647        mov r7, r8
    ea8c:   2400        movs    r4, #0

        // See if the unique_id is already in the table (relearn)
        while(( pos < GLOBAL_MAX_DEVICES ) && ( unique_id != DEVICE_new_table.unique_id[pos] ))
    ea8e:   f837 3f02   ldrh.w  r3, [r7, #2]!
    ea92:   454b        cmp r3, r9
    ea94:   b2e5        uxtb    r5, r4
    ea96:   d00f        beq.n   eab8 <DEVICE_Add_To_New_Table+0x64>
        {
            uart_printf(-1, "%d,", pos);
    ea98:   4622        mov r2, r4
    ea9a:   493e        ldr r1, [pc, #248]  ; (eb94 <DEVICE_Add_To_New_Table+0x140>)
    ea9c:   f04f 30ff   mov.w   r0, #4294967295
    eaa0:   f7f8 ff6e   bl  7980 <uart_printf>
            debug_print( "B,", false, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
    eaa4:   2100        movs    r1, #0
    eaa6:   483c        ldr r0, [pc, #240]  ; (eb98 <DEVICE_Add_To_New_Table+0x144>)
    eaa8:   460a        mov r2, r1
    eaaa:   2306        movs    r3, #6
    eaac:   3401        adds    r4, #1
    eaae:   f00b ff39   bl  1a924 <debug_print>

        // See if the unique_id is already in the table (relearn)
        while(( pos < GLOBAL_MAX_DEVICES ) && ( unique_id != DEVICE_new_table.unique_id[pos] ))
    eab2:   2c0d        cmp r4, #13
    eab4:   d1eb        bne.n   ea8e <DEVICE_Add_To_New_Table+0x3a>
    eab6:   4625        mov r5, r4
        {
            uart_printf(-1, "%d,", pos);
            debug_print( "B,", false, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
            pos++;
        }

I feel like a programmer blaming his tools is admitting defeat, but I really can't see how this is anything other than a compiler bug. If anyone has any ideas, I would appreciate the input.


Solution

  • As indicated in the comments, this is bug in your program, namely a simple out-of-bounds access of the array.

    Given GLOBAL_MAX_DEVICES = 13, then you end up accessing DEVICE_new_table.unique_id[13] and beyond. In debug build you might have a zero there and it "works" by chance. Then in release you get some non-zero value and the loop goes haywire, short-circuiting away the loop terminating condition.

    Also, if your code contains undefined behavior such as out-of-bounds access, you can get really strange machine code generated when the optimizer attempts things like loop unrolling. Like for example the compiler may reason as: "okay this first check is always true since all items in the array are zero, so we can remove the check after && entirely".

    This could be fixed by writing the loop in simpler ways. Assuming you wish to preserve the value of pos after the loop(?), then do this:

    uint8_t pos = 0;
    ...
    for(; pos<GLOBAL_MAX_DEVICES; pos++)
    {
      if(unique_id != DEVICE_new_table.unique_id[pos])
        break;
      debug_print(...);
    }