RISC-V trap handler reentrancy on an exception in the trap handler

It seems that it's a common trick to set mscratch to a trap handler memory context during system initialization and then execute:

csrrw tp, CSR_MSCRATCH, tp

at the beginning of the machine mode trap handler to swap tp and mscratch. tp is then used in the machine mode trap handler to access memory/state needed by the trap handler itself. At the end of the trap handler this is executed again to restore the memory context pointer to mscratch and the original value of tp.

However it seems that there is a problem. If for some reason there is an exception taken within the trap handler itself, you would swap the original value of tp, which is now in mscratch with the machine mode trap handler context and consequently you would be using the wrong memory context when handling a trap handler generated exception.

You would need a free register to read MPP if you want to detect this condition.

How can this be handled?

Solution

I have a toy handler that support nested exceptions. I always¹ keep a pointer to the context block of the currently running code in that scratch csr — this includes during the execution of exception handlers themselves.

My toy handler starts something like this:

csrrw t0, scratch, t0   # swap interrupted t0 and its context block pointer
beqz t0, bootUp         # or if it is null then put the base CBP in there...
sw t1, 12(t0)           # save interrupted t1 to its context block
lw t1, 4(t0)            # fetch context block pointer for me
csrrw t1, scratch, t1   # t1 has interrupted t0, CSR scratch points to my context block
sw t1, 8(t0)            # save interrupted t0 to its context block
sw t2, 16(t0)           # save t2

After this code, t0 is a pointer to the interrupted code's context block, the interrupted t0, t1 & t2 are saved into the interrupted code's context block, and thus t1 and t2 are free to be used, and the scratch csr is a pointer to the context block for this run of the exception handler.

A context block for user code can be stored anywhere, e.g. malloced as needed. In each user context block (e.g at 4(t0) in the above), I store a pointer to the special exception handler context block array.

In any exception handler context block (in the special array), each one (at offset 4) is preinitialized to refer to the next entry in the array, so nested exceptions will naturally take a different context block in the special array, working like a stack for dynamically nested exception handler; so, for as many elements are in that array, it can service that many nested exceptions.

The context blocks used for exception handlers could have other data, such as a pointer to global data, if desired.

Handling a quick interrupt may save a few registers, run some code, restore those registers and resume the interrupted code. For some other interrupt the scheduler may choose to switch contexts (e.g. I/O ready), resuming a different thread than was last interrupted. Since the proper context block already has the partial state of an interrupted thread, for a full context switch only the additional register state need to be saved (no need to transfer any register state from a temporary place to the proper context block).

_{¹ Except for that ~4-5 instruction window at the beginning of the handler.}

As part of resumption, for whatever thread is being resumed, one would restore the its t0 and its context block pointer into the scratch csr.

If nested exceptions are encountered, the thread to resume is simply the interrupted exception handler, whereas if user thread is interrupted, it is a scheduler decision whether to resume the most recently interrupted user thread (cheaper as often does not require full context switch) or resume another previously suspended user thread, which may have been suspended by time slice timeout or by I/O syscall. If an external interrupt indicates a completed I/O operation a thread waiting on that I/O may be appropriate to resume. (The system calls, I'm doing as csrrw zero, %arg, zero, so as to encode a syscall number in a trapping instruction, and also to avoid using ecall since RARS services that).

In my toy system, I support multi threading at the user level, so one thread can ask the "operating system" via a syscall to create another thread. Thus, the user threads' context blocks can be "malloc"ed while the exception handler context blocks are stacked in that pre-initialized array.

My toy system runs under RARS, and so yes, it does not have the complications of multiple address spaces.