Search code examples
cclangriscvriscv32lld

How can I get clang compiler / lld linker for riscv32 to not use an lui for every memory address in the same range?


I'm writing a standalone program for a risc-v core that has many writes and reads to two different memory ranges. (One range is for ram, and the other is for peripherals. Maybe there will be more ranges later.) The output assembly code has many unnecessary lui instructions.

As a basic example, I'll have c code that says...

a=40;
b=80;
c=0;

When compiled I get:

lui a0, 0x1
li a1, 0x28
sw a1, 0x2c(a0)

lui a0, 0x1
li a2, 0x50
sw a2, 0x30(a0)

lui a0, 0x1
sw zero, 0x34(a0)

The generated assembly code has used lui to load address 0x1000 into a0 3 times in a row, and hasn't changed a0 in between. (Also there are no jumps into this area from other places.) It would be better if it would just leave the base address in a0 after the first time it is written, and not update it later.

I can fix somewhat by adding to my clang setup the option:

--for-linker --relax-gp

Then I can set up the global pointer in my link file and start code as in the SiFive guide to the global pointer.

This fixes the problem for this basic example, by using the global pointer as the base address in memory, and then storing at offsets from the global pointer.

a=40;

now becomes

li a1, 0x28
sw a1, -0x7d4(gp)

That's great if I only have one memory range that I'd like to write to. In my case I have a second memory range that I'd like to write to that isn't near the ram data address range.

Is there some way to have a second global pointer that clang / lld can use?

Even more generally, if more address ranges are used, we can't just reserve all the registers with global pointers. In this case is there some option to tell clang / lld to try to leave registers with base addresses alone until necessary within a function / interrupt?


Solution

  • @ErikEidt 's idea of using a struct to point to a memory range worked really well. It removed redundant upper / base address memory writes.

    I defined my struct as

    struct devices
    {
    unsigned int a;
    unsigned int b;
    unsigned int c;
    };
    

    As a global, outside of my main I declared:

    struct devices *dv;
    

    In my main I set my struct pointer to the address of my peripherals:

    dv = (struct devices *)0x00080000u;
    

    (Also dv = (void *)0x00080000u; worked.)

    Then in my main I could set variables in the struct like

    dv->a=40;
    dv->b=80;
    dv->c=0;
    

    I compiled and assembled with

    clang -target riscv32 -march=rv32imc_zbb_zicsr_zbs -fuse-ld=lld --gcc-toolchain=/$HOME/toolchains -static -o test.bit -nostdlib -nostartfiles -T link.lds -O3 -ffreestanding -g -ggdb --for-linker --relax-gp hello.c
    

    I got...

    ;Set the pointer:
    lui a1,0x80
    sw a1, -0x800(gp)
    
    ;dv->a;
    li a0, 0x28
    sw a0, 0x2c(a1)
    
    ;dv->b;
    li a2, 0x50
    sw a2, 0x30(a1)
    
    ;dv->c;
    sw zero, 0x34(a1)
    

    Note each lui is removed for each memory write, and because dv is a global, the next time this adderess space is read from clang / lld will insert:

    lw a0, -0x800(gp)
    

    Between using a struct for addressing memory regions and using a global pointer as in the SiFive guide to global pointers my code ran 24% faster.

    Thanks again @ErikEidt .

    (I couldn't find an option to make clang / lld just do this automatically by identifying the repeated unchanged base memory writes to registers (lui command) and modifying the assembly code generated.)