I have been working on integrating a colleague's software library into our larger application. He has been writing and testing his library under -O0
on gcc 4.9.3
. This is embedded software for an alarm system. This bug is observed under -Os
optimisation and we are using straight C (none of that C++ nonsense), also on gcc 4.9.3
. The architecture is ARM Cortex-M4F.
I am experiencing an issue with integrating that code into the larger stack. The below code should iterate through at most GLOBAL_MAX_DEVICES
until it finds a free space in the table to insert its entry:
RET_VALUES DEVICE_Add_To_New_Table( uint8_t *new_id, uint16_t unique_id )
{
RET_VALUES ret_value = RET_OK;
uint8_t pos = 0;
[...]
else
{
debug_print( "A,", true, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
// See if the unique_id is already in the table (relearn)
while(( unique_id != DEVICE_new_table.unique_id[pos] ) && ( pos < GLOBAL_MAX_DEVICES ))
{
debug_print( "B,", false, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
pos++;
}
[...]
The problem we are experiencing is that instead of the loop iterating GLOBAL_MAX_DEVICES
times (currently 13) it iterates over 200 times. The sequence "B," is printed out multiple times in the while(( unique_id != DEVICE_new_table.unique_id[pos] ) && ( pos < GLOBAL_MAX_DEVICES ))
loop.
Three changes can be made which resolve this issue;
pos
can be made a volatile, which (I believe) inhibits some optimisation on this value.
The parameters in the while
block can be reversed, forcing evaluation of pos < GLOBAL_MAX_DEVICES
to occur first.
It is possible to add a build flag, -fno-aggressive-loop-optimizations
, to turn off some loop optimisations.
Only one given fix is required, it appears, to resolve the issue.
I initially thought the issue may be down to a short-circuit optimisation. But, I can confirm that the unique_id
parameter is not zero (it is 0x042b on this sample), and DEVICE_new_table
is initialised to all zeroes in the unique_id
field of the struct. And, in any case, even if the first argument is true, the latter should still be evaluated; short-circuiting should only apply if the first is false, which is clearly not the case, at least not initially, because the loop is iterating for some time.
I am therefore confused as to why this loop iterates hundreds of times (I presume eventually a collision with a memory address occurs, which causes a chance match against the unique_id, possibly with a value in the stack.) Adding a print statement into the loop that prints the iterator pos
causes the loop to run over 13,000 times. 13,000 is much bigger than 13, I have checked several times.
Comparing the machine code produced by the broken sample (with additional printf statement added) and working sample (also with printf) shows that the working variant, with the arguments in the boolean operator reversed, includes a comparison with the iteration maximum of 13
.
Broken:
debug_print( "A,", true, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
ea7a: 4822 ldr r0, [pc, #136] ; (eb04 <DEVICE_Add_To_New_Table+0xb0>)
ea7c: f8df 80a4 ldr.w r8, [pc, #164] ; eb24 <DEVICE_Add_To_New_Table+0xd0>
ea80: 2101 movs r1, #1
ea82: 2200 movs r2, #0
ea84: 2306 movs r3, #6
ea86: f00b fef9 bl 1a87c <debug_print>
// See if the unique_id is already in the table (relearn)
while(( unique_id != DEVICE_new_table.unique_id[pos] ) && ( pos < GLOBAL_MAX_DEVICES ))
ea8a: 2400 movs r4, #0
ea8c: f838 3f02 ldrh.w r3, [r8, #2]!
ea90: 454b cmp r3, r9
ea92: b2e7 uxtb r7, r4
ea94: 4626 mov r6, r4
ea96: f104 0401 add.w r4, r4, #1
ea9a: d00c beq.n eab6 <DEVICE_Add_To_New_Table+0x62>
{
uart_printf(-1, "%d,", pos);
ea9c: 4632 mov r2, r6
ea9e: 491a ldr r1, [pc, #104] ; (eb08 <DEVICE_Add_To_New_Table+0xb4>)
eaa0: f04f 30ff mov.w r0, #4294967295
eaa4: f7f8 ff6c bl 7980 <uart_printf>
debug_print( "B,", false, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
eaa8: 2100 movs r1, #0
eaaa: 4818 ldr r0, [pc, #96] ; (eb0c <DEVICE_Add_To_New_Table+0xb8>)
eaac: 460a mov r2, r1
eaae: 2306 movs r3, #6
eab0: f00b fee4 bl 1a87c <debug_print>
eab4: e7ea b.n ea8c <DEVICE_Add_To_New_Table+0x38>
pos++;
}
Working (trimmed unnecessary parts), note the cmp r4, #13
statement at eab2
which is absent in the broken sample:
debug_print( "A,", true, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
ea7a: 4845 ldr r0, [pc, #276] ; (eb90 <DEVICE_Add_To_New_Table+0x13c>)
ea7c: f8df 8148 ldr.w r8, [pc, #328] ; ebc8 <DEVICE_Add_To_New_Table+0x174>
ea80: 2101 movs r1, #1
ea82: 2200 movs r2, #0
ea84: 2306 movs r3, #6
ea86: f00b ff4d bl 1a924 <debug_print>
ea8a: 4647 mov r7, r8
ea8c: 2400 movs r4, #0
// See if the unique_id is already in the table (relearn)
while(( pos < GLOBAL_MAX_DEVICES ) && ( unique_id != DEVICE_new_table.unique_id[pos] ))
ea8e: f837 3f02 ldrh.w r3, [r7, #2]!
ea92: 454b cmp r3, r9
ea94: b2e5 uxtb r5, r4
ea96: d00f beq.n eab8 <DEVICE_Add_To_New_Table+0x64>
{
uart_printf(-1, "%d,", pos);
ea98: 4622 mov r2, r4
ea9a: 493e ldr r1, [pc, #248] ; (eb94 <DEVICE_Add_To_New_Table+0x140>)
ea9c: f04f 30ff mov.w r0, #4294967295
eaa0: f7f8 ff6e bl 7980 <uart_printf>
debug_print( "B,", false, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
eaa4: 2100 movs r1, #0
eaa6: 483c ldr r0, [pc, #240] ; (eb98 <DEVICE_Add_To_New_Table+0x144>)
eaa8: 460a mov r2, r1
eaaa: 2306 movs r3, #6
eaac: 3401 adds r4, #1
eaae: f00b ff39 bl 1a924 <debug_print>
// See if the unique_id is already in the table (relearn)
while(( pos < GLOBAL_MAX_DEVICES ) && ( unique_id != DEVICE_new_table.unique_id[pos] ))
eab2: 2c0d cmp r4, #13
eab4: d1eb bne.n ea8e <DEVICE_Add_To_New_Table+0x3a>
eab6: 4625 mov r5, r4
{
uart_printf(-1, "%d,", pos);
debug_print( "B,", false, DEBUG_DEVELOP_MODE_PLAIN, DEBUG_TRACE );
pos++;
}
I feel like a programmer blaming his tools is admitting defeat, but I really can't see how this is anything other than a compiler bug. If anyone has any ideas, I would appreciate the input.
As indicated in the comments, this is bug in your program, namely a simple out-of-bounds access of the array.
Given GLOBAL_MAX_DEVICES
= 13, then you end up accessing DEVICE_new_table.unique_id[13]
and beyond. In debug build you might have a zero there and it "works" by chance. Then in release you get some non-zero value and the loop goes haywire, short-circuiting away the loop terminating condition.
Also, if your code contains undefined behavior such as out-of-bounds access, you can get really strange machine code generated when the optimizer attempts things like loop unrolling. Like for example the compiler may reason as: "okay this first check is always true since all items in the array are zero, so we can remove the check after && entirely".
This could be fixed by writing the loop in simpler ways. Assuming you wish to preserve the value of pos
after the loop(?), then do this:
uint8_t pos = 0;
...
for(; pos<GLOBAL_MAX_DEVICES; pos++)
{
if(unique_id != DEVICE_new_table.unique_id[pos])
break;
debug_print(...);
}