Search code examples
cassemblyembeddedvolatilebit-fields

Avoid volatile bit-field assignment expression reading or writing memory several times


I want to use volatile bit-field struct to set hardware register like following code

union foo {
    uint32_t value;
    struct {
        uint32_t x : 1;
        uint32_t y : 3;
        uint32_t z : 28;
    };
};
union foo f = {0};
int main()
{
    volatile union foo *f_ptr = &f;
    //union foo tmp;
    *f_ptr =  (union foo) {
        .x = 1,
        .y = 7,
        .z = 10,
    };
    //*f_ptr = tmp;
    return 0;
}

However, the compiler will make it to STR, LDR HW register several times. It is a terrible things that it will trigger hardware to work at once when the register is writed.

main:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    movw    r3, #:lower16:.LANCHOR0
    movs    r0, #0
    movt    r3, #:upper16:.LANCHOR0
    ldr r2, [r3]
    orr r2, r2, #1
    str r2, [r3]
    ldr r2, [r3]
    orr r2, r2, #14
    str r2, [r3]
    ldr r2, [r3]
    and r2, r2, #15
    orr r2, r2, #160
    str r2, [r3]
    bx  lr
    .size   main, .-main
    .global f
    .bss
    .align  2

My gcc version is : arm-linux-gnueabi-gcc (Linaro GCC 4.9-2017.01) 4.9.4 and build with -O2 optimation


I have tried to use the local variable to resolve this problem

union foo {
    uint32_t value;
    struct {
        uint32_t x : 1;
        uint32_t y : 3;
        uint32_t z : 28;
    };
};
union foo f = {0};
int main()
{
    volatile union foo *f_ptr = &f;
    union foo tmp;
    tmp =  (union foo) {
        .x = 1,
        .y = 7,
        .z = 10,
    };
    *f_ptr = tmp;
    return 0;
}

Well, it will not STR to HW register several times

main:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    movs    r1, #10
    movs    r2, #15
    movw    r3, #:lower16:.LANCHOR0
    bfi r2, r1, #4, #28
    movt    r3, #:upper16:.LANCHOR0
    movs    r0, #0
    str r2, [r3]
    bx  lr
    .size   main, .-main
    .global f
    .bss
    .align  2

I think it is still not a good idea to use local variable, considering the limitation of binary size for embedded system.

Is there any way to handle this problem without using local variable?



Solution

  • I think this is a bug in GCC. Per discussion below, you might consider using:

    f_ptr->value =  (union foo) {
            .x = 1,
            .y = 7,
            .z = 10,
        } .value;
    

    By the C standard, the code a compiler generates for a program may not access a volatile object when the original C code nominally does not access the object. The code *f_ptr = (union foo) { .x = 1, .y = 7, .z = 10, }; is a single assignment to *f_ptr. So we would expect this to generate a single store to *f_ptr; generating two stores is a violation of the standard’s requirements.

    We could consider an explanation for this to be that GCC is treating the aggregate (the union and/or the structure within it) as several objects, each individually volatile, rather than one aggregated volatile object.1 But, if this were so, then it ought to generate separate 16-bit strh instructions for the parts (per the original example code, which had 16-bit parts), not the 32-bit str instructions we see.

    While using a local variable appears to work around the issue, I would not rely on that, because the assignment of the compound literal above is semantically equivalent, so the cause of why GCC generates broken assembly code for one sequence of code and not the other is unclear. With different circumstances (such as additional or modified code in the function or other variations that might affect optimization), GCC might generate broken code with the local variable too.

    What I would do is avoid using an aggregate for the volatile object. The hardware register is, presumably, physically more like a 32-bit unsigned integer than like a structure of bit-fields (even though semantically it is defined with bit-fields). So I would define the register as volatile uint32_t and use that type when assigning values to it. Those values could be prepared with bit shifts or structures with bit-fields or whatever other method you prefer.

    It should not be necessary to avoid using local variables, as the optimizer should effectively eliminate them. However, if you wish to neither change the register definition nor use local variables, an alternative is the code I opened with:

    f_ptr->value =  (union foo) {
            .x = 1,
            .y = 7,
            .z = 10,
        } .value;
    

    That prepares the value to be stored but then assigns it using the uint32_t member of the union rather than using the whole union, and testing with ARM GCC 4.6.4 on Compiler Explorer (the closest match I could find on Compiler Explorer to what you are using) suggests it generates a single store with minimal code:

    main:
            ldr     r3, .L2
            mov     r2, #175
            str     r2, [r3, #0]
            mov     r0, #0
            bx      lr
    .L2:
            .word   .LANCHOR0
    .LANCHOR0 = . + 0
    f:
    

    Footnote

    1 I would consider this to a bug too, as the C standard does not make provision for applying the volatile qualifier on a union or structure declaration as applying to the members rather than to the whole aggregate. For arrays, it does say that qualifiers apply to the elements, rather than the whole array (C 2018 6.7.3 10). It has no such wording for unions or structures.