Search code examples
gccarmcortex-m

Why GCC (ARM Cortex-M0) generates UXTB instruction when it should know that data is already uint8


I'm using a Cortex-M0 MCU from NXP (LPC845) and I'm trying to figure out what GCC is trying to do :)

Basically, the C code (pseudo) is as follows:

volatile uint8_t readb1 = 0x1a; // dummy
readb1 = GpioPadB(GPIO_PIN);

and the macro I wrote is

(*((volatile uint8_t*)(SOME_GPIO_ADDRESS)))

Now the code is working, but it produced some extra UXTB instruction I don't understand

00000378:   ldrb    r3, [r3, #0]
0000037a:   ldr     r2, [pc, #200]  ; (0x444 <AppInit+272>)
0000037c:   uxtb    r3, r3
0000037e:   strb    r3, [r2, #0]
105         asm("nop");

My explanation is as follows:

  • load BYTE from address specified in R3, put result in R3 <-- this is load from GPIO register as BYTE
  • load in R2 address of readb1 variable
  • UXTB extends the uint8 value ??? But rotate argument is 0, so basically does nothing for uint8 !
  • store as BYTE to R2's address (my variable) data from R3

Why does that?

First of all, it should know that data in R3 has just a BYTE meaning (it already generates LDRB correctly). Second, the STRB will already trim 7..0 LSB so why using UXTB ?

Thanks for clarifications,

EDITED: Compiler version:

gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)

I use -O3


Solution

  • Looks like an extra instruction left in by the compiler and/or there is some nuance to the cortex-m or newer cores (would love to know what that nuance is).

    #define GpioPadB(x) (*((volatile unsigned char *)(x)))
    volatile unsigned char readb1;
    void fun ( void )
    {
        readb1 = 0x1A;
        readb1 = GpioPadB(0x1234000);
    }
    

    an apt gotten gcc

    arm-none-eabi-gcc --version
    arm-none-eabi-gcc (15:4.9.3+svn231177-1) 4.9.3 20150529 (prerelease)
    Copyright (C) 2014 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    arm-none-eabi-gcc -O2 -c -mthumb so.c -o so.o
    arm-none-eabi-objdump -d so.o
    
    
    00000000 <fun>:
       0:   231a        movs    r3, #26
       2:   4a03        ldr     r2, [pc, #12]   ; (10 <fun+0x10>)
       4:   7013        strb    r3, [r2, #0]
       6:   4b03        ldr     r3, [pc, #12]   ; (14 <fun+0x14>)
       8:   781b        ldrb    r3, [r3, #0]
       a:   7013        strb    r3, [r2, #0]
       c:   4770        bx      lr
       e:   46c0        nop         ; (mov r8, r8)
      10:   00000000    .word   0x00000000
      14:   01234000    .word   0x01234000
    

    as one would expect.

    arm-none-eabi-gcc -O2 -c -mthumb -march=armv7-m so.c -o so.o
    arm-none-eabi-objdump -d so.o
    so.o:     file format elf32-littlearm
    
    
    Disassembly of section .text:
    
    00000000 <fun>:
       0:   4a03        ldr     r2, [pc, #12]   ; (10 <fun+0x10>)
       2:   211a        movs    r1, #26
       4:   4b03        ldr     r3, [pc, #12]   ; (14 <fun+0x14>)
       6:   7011        strb    r1, [r2, #0]
       8:   781b        ldrb    r3, [r3, #0]
       a:   b2db        uxtb    r3, r3
       c:   7013        strb    r3, [r2, #0]
       e:   4770        bx  lr
      10:   00000000    .word   0x00000000
      14:   01234000    .word   0x01234000
    

    with the extra utxb instruction in there

    Something a bit newer

    arm-none-eabi-gcc --version
    arm-none-eabi-gcc (GCC) 10.2.0
    Copyright (C) 2020 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    

    for armv6m and armv7m

    00000000 <fun>:
       0:   231a        movs    r3, #26
       2:   4a03        ldr     r2, [pc, #12]   ; (10 <fun+0x10>)
       4:   7013        strb    r3, [r2, #0]
       6:   4b03        ldr     r3, [pc, #12]   ; (14 <fun+0x14>)
       8:   781b        ldrb    r3, [r3, #0]
       a:   7013        strb    r3, [r2, #0]
       c:   4770        bx      lr
       e:   46c0        nop         ; (mov r8, r8)
      10:   00000000    .word   0x00000000
      14:   01234000    .word   0x01234000
    

    for armv4t

    00000000 <fun>:
       0:   231a        movs    r3, #26
       2:   4a03        ldr     r2, [pc, #12]   ; (10 <fun+0x10>)
       4:   7013        strb    r3, [r2, #0]
       6:   4b03        ldr     r3, [pc, #12]   ; (14 <fun+0x14>)
       8:   781b        ldrb    r3, [r3, #0]
       a:   7013        strb    r3, [r2, #0]
       c:   4770        bx      lr
       e:   46c0        nop         ; (mov r8, r8)
      10:   00000000    .word   0x00000000
      14:   01234000    .word   0x01234000
    

    and the utxb is gone.

    I think it is just a missed optimization, peephole or otherwise.

    As answered already though, when you use non-gpr-sized variables you can expect and/or tolerate the compiler converting up to the register size. Varies by compiler and target as to whether they do it on the way in or the way out (when a variable is read or just before it is written or used down the road).

    For x86 where you can access various portions of the register separately (or use memory based operands) you will see they do not do this (in gcc) even for cases when it clearly needs a sign extension or padding. And sort it out down the road when the value is used.

    You can search the gcc sources for utxb and perhaps see the issue or a comment.

    EDIT

    Note that clang takes a different path, it burns clocks generating the address but does not do the extension

    00000000 <fun>:
       0:   f240 0000   movw    r0, #0
       4:   f2c0 0000   movt    r0, #0
       8:   211a        movs    r1, #26
       a:   7001        strb    r1, [r0, #0]
       c:   f244 0100   movw    r1, #16384  ; 0x4000
      10:   f2c0 1123   movt    r1, #291    ; 0x123
      14:   7809        ldrb    r1, [r1, #0]
      16:   7001        strb    r1, [r0, #0]
      18:   4770        bx  lr
    
    clang --version
    clang version 11.1.0 (https://github.com/llvm/llvm-project.git 1fdec59bffc11ae37eb51a1b9869f0696bfd5312)
    Target: armv7m-none-unknown-eabi
    Thread model: posix
    InstalledDir: /opt/llvm11armv7m/bin
    

    I think it is simply an optimization problem with gcc/gnu.