operand usage in asm volatile for riscv instructions

I'm trying to write a C function which takes a 32-bit instruction encoding for a riscv load/store (h/w/d) instruction and unrolls it as a sequence of load/store (b) instructions.

how would I go about doing this in asm volatile ?

here's an example of what im trying to do:

input arg: 0032a303 (this is a lw t1,3(t0)). i want to unroll this as follows:

lb t2, 3(t0);
slli t2, t2, 0;
or t1, t1, t2;
lb t2, 4(t0);
slli t2, t2, 8;
or t1, t1, t2;
lb t2, 3(t0);
slli t2, t2, 16;
or t1, t1, t2;
lb t2, 3(t0);
slli t2, t2, 24;
or t1, t1, t2;

t2 here is a clobbered register and i'm not worried about any of these regs being polluted. i want to do this in an asm volatile block.

here's what i have tried:

void load_bytes(reg_t rd, reg_t rs1, reg_t imm, reg_t iter){
    for (int i = 0; i < iter ; i ++) {
        // "lb" from *(rs1 + imm + i) into t0
        // "slli" t0 by (iter - i) bytes
        // "or" t0 with rd
        asm volatile(
            "lb t0, %[imm_val](%[rs1_val])\n"
            "slli t0, t0, %[shift_val]\n"
            "or %[rd_val], %[rd_val], t0\n"
            : [rd_val] "+r" (rd)
            : [rs1_val] "r" (rs1), [imm_val] "i" (imm + i), [shift_val] "i" (sizeof(byte_t)*iter - i)
            : "t0"
        );
    }
}

i don't think what i've done is correct, and i don't find the extended asm doc to be very helpful here either. can someone please help me understand what mistakes im making while writing the inline asm ?

i had initially tried to write this entirely in asm, but the disass seems like the enum of that register is taken as the mem address instead of the contents of that register.

i keep getting an error message saying impossible constraint. im not really sure what that is supposed to mean.

thanks in advance!

Solution

Your example code is messed up, it loads bytes 3, 4, 3, 3. It doesn't zero initialize t1. lb sign extends which is unlikely to be what you want. Also it shifts in increasing order while your C code goes in reverse. sizeof(byte_t) is likely going to be 1 but you want bits not bytes for the shift. i will never be equal to iter so your iter - i won't ever produce 0 which is again contrary to your description. The direct cause of the error is that the shift count needs to be an assembly time constant. What's the point of iter? Do you want this to work for multiple sizes?

This is simple enough code that the C compiler should be perfectly capable of handling. Also you probably want to return the result so rd should be a pointer.

What's wrong with:

void load_bytes(reg_t* rd, reg_t rs1, reg_t imm)
{
    uint8_t* p = (uint8_t*) rs1 + imm;
    *rd = p[0] | p[1] << 8 | p[2] << 16 | p[3] << 24;
}

Not sure what the whole point of the exercise is, why you don't want to just use a load (with possibly a byte swap).

If you want the pure asm version, that could look like:

    add     a2,a2,a1
    lbu     a3,0(a2)
    lbu     a1,1(a2)
    slli    a1,a1,8
    or      a3,a3,a1
    lbu     a1,2(a2)
    slli    a1,a1,16
    or      a3,a3,a1
    lbu     a1,3(a2)
    slli    a1,a1,24
    or      a3,a3,a1
    sw      a3,0(a0)
    ret