For educational purposes I used https://godbolt.org/z/7F-Lhm to translate
// C++ code
char i = 3;
char A[] = {0,1,2,3,4,5};
int myfunction() {
return A[i];
}
into
# RISC-V instructions
myfunction(): # @myfunction()
lui a0, %hi(i)
lbu a0, %lo(i)(a0)
lui a1, %hi(A)
addi a1, a1, %lo(A)
add a0, a0, a1
lbu a0, 0(a0)
ret
i:
.byte 3 # 0x3
A:
.ascii "\000\001\002\003\004\005"
But why is A[i]
loaded with add a0, a0, a1
, lbu a0, 0(a0)
and not just with lbu a0, a0(a1)
?
It would make sense if for lbu dest, offset(baseAdress)
only dest
and baseAdresse
are allowed to be register adresses, whereas offset
is a hardcoded number in the instruction word itself. But in the same code above I see lbu a0, %lo(i)(a0)
so offset
can apparently also be "somewhat variable"?
Maybe the reason I don't understand this is because I don't really understand why this $hi $lo
thing is necessary in the first place. Why are we doing lui a0, %hi(i)
, lbu a0, %lo(i)(a0)
instead of just lbu a0, 0(i)
?
As Erik Eidt said, i
is global variable, i.e. resides somewhere in the 32/64-bit addressable memory and can change at any time.
The 32/64-bit address of i
is loaded in two parts, since 32/64-bits can not be encoded in an immediate. %hi(i)
and %lo(i)
are the higher and lower part of the address of i
. i
is loaded from memory since it might have changed between calls to myfunction().