Search code examples
assemblyriscv

Address offset in RISC-V load instructions hardcoded or not?


For educational purposes I used https://godbolt.org/z/7F-Lhm to translate

// C++ code
char i = 3;
char A[] = {0,1,2,3,4,5};  
int myfunction() {
    return A[i];
}

into

# RISC-V instructions 
myfunction():                        # @myfunction()
        lui     a0, %hi(i)
        lbu     a0, %lo(i)(a0)
        lui     a1, %hi(A)
        addi    a1, a1, %lo(A)
        add     a0, a0, a1
        lbu     a0, 0(a0)
        ret
i:
        .byte   3                       # 0x3

A:
        .ascii  "\000\001\002\003\004\005"

But why is A[i] loaded with add a0, a0, a1, lbu a0, 0(a0) and not just with lbu a0, a0(a1)?

It would make sense if for lbu dest, offset(baseAdress) only dest and baseAdresse are allowed to be register adresses, whereas offset is a hardcoded number in the instruction word itself. But in the same code above I see lbu a0, %lo(i)(a0) so offset can apparently also be "somewhat variable"?

Maybe the reason I don't understand this is because I don't really understand why this $hi $lo thing is necessary in the first place. Why are we doing lui a0, %hi(i), lbu a0, %lo(i)(a0) instead of just lbu a0, 0(i)?


Solution

  • As Erik Eidt said, i is global variable, i.e. resides somewhere in the 32/64-bit addressable memory and can change at any time.

    The 32/64-bit address of i is loaded in two parts, since 32/64-bits can not be encoded in an immediate. %hi(i) and %lo(i) are the higher and lower part of the address of i. i is loaded from memory since it might have changed between calls to myfunction().