Search code examples
assemblygdbjitriscvforth

What might cause a SIGILL (other than an illegal instruction) on RISC-V?


I am trying to load some Forth into my Forth compiler running on a RISC-V SBC (I do not believe this is a Forth-related question though):

 >load /root/repos/riscyforth/test2.4th
 : cuboid * * [ The cuboid has a volume of ] . ;

 OK
 cuboid


 Program received signal SIGILL, Illegal instruction.
 0x0000003ff7dbd038 in ?? ()

What the above shows is that I load the file with the Forth and the first line, echoed back to the terminal, is the definition of the word cuboid. The subsequent OK shows that the Forth compiler has successfully compiled the word.

Then the second line is an invocation of this word, cuboid and then the message that the program (being run under GDB in this case) has cashed with a SIGILL.

However, this is what a disassembly shows:

(gdb) disassemble 0x3ff7dbd038, 0x3ff7dbd078
Dump of assembler code from 0x3ff7dbd038 to 0x3ff7dbd078:
=> 0x0000003ff7dbd038:    addi    s9,s9,-8
   0x0000003ff7dbd03c:    sd      s7,0(s9)
   0x0000003ff7dbd040:    li      s8,63
   0x0000003ff7dbd044:    slli    s8,s8,0x20
   0x0000003ff7dbd048:    lui     t0,0xf7dbd
   0x0000003ff7dbd04c:    ori     t0,t0,0
   0x0000003ff7dbd050:    slli    t0,t0,0x20
   0x0000003ff7dbd054:    srli    t0,t0,0x20
   0x0000003ff7dbd058:    or      s8,s8,t0
   0x0000003ff7dbd05c:    addi    s8,s8,112
   0x0000003ff7dbd060:    mv      s7,s8
   0x0000003ff7dbd062:    nop
   0x0000003ff7dbd064:    lui     t0,0x10
   0x0000003ff7dbd068:    addi    t0,t0,1976 # 0x107b8 <COLON_NEXT>
   0x0000003ff7dbd06c:    jr      t0
   0x0000003ff7dbd070:    addi    a2,sp,868
   0x0000003ff7dbd072:    nop
   0x0000003ff7dbd074:    unimp
   0x0000003ff7dbd076:    unimp

This is exactly as I would expect and as can be seen there is a perfectly good instruction at 0x0000003ff7dbd038.

This memory is mmap'ed into the system as executable and this mechanism works well for words I define on the command line (as opposed to read in from the file). Moreover, if I just define the word in the file I load and then run it from the command line it's also fine (I am aware these might suggest an issue with the loading, but I cannot see it or why it would generate this signal).

To add to the puzzle, if I step through the code using GDB then I don't get this SIGILL problem and the code beginning at 0x0000003ff7dbd038 executes as I would expect.

This is RV64 on RVBoards Nezha - immediately before this the standard Forth interpreter commands are executed:

  NEXT:
            ld s8, 0(s7)            # Word address register takes content of next secondary
            addi s7, s7, ADDRWIDTH  # Next secondary along

   RUN:
            ld t0, 0(s8)            # Extract first instruction address of primitive

Solution

  • Unlike on x86, RISC-V does not permit to write machine code and execute it without synchronisation, even within the same thread. Before executing newly written machine code, issue a fence.i instruction to synchronise the instruction cache with the current state of memory.