I am trying to implement a recursive factorial function in RISC-V assembly language that raises an error if there is an overflow. However, I am struggling to detect it. Is there a solution for this ?
`
Have a look at how GCC implements __builtin_smul_overflow
or umul
(https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html)
unsigned checked_umul(unsigned a, unsigned b, bool *overflow_happened)
{
unsigned res;
bool ovf = __builtin_umul_overflow(a,b, &res);
if (ovf){
*overflow_happened = 1; // set or leave unmodified
// check once at the end.
}
return res;
}
clang 16 -O2 for rv32gc on Godbolt
checked_umul(unsigned int, unsigned int, bool*):
mulhu a3, a0, a1 # high half of a * b
mul a0, a0, a1 # result = a * b
beqz a3, .LBB1_2
li a1, 1
sb a1, 0(a2) # just an example of something to branch on
.LBB1_2:
ret
Of course you'd inline this in asm, not actually call a function; making it a function in C is just so we can look at the asm for it on its own, as in How to remove "noise" from GCC/clang assembly output?
You can branch on the high half being non-zero and do whatever you like, or you could OR
the high halves together across factorial iterations to just check for overflow at the end.
(If you expect callers to often pass inputs that overflow, an early-out on it can save time, and make the worst case performance only about 13 multiplies, as 13!
overflows a 32-bit integer. vs. a worst-case time of about 2^32-1 iterations with an early out. But otherwise just an or
can be cheaper than branching every time. Of course if you cared about performance, you wouldn't be doing a recursive implementation; that adds a lot of overhead vs. a loop.)
As Jester suggested, mulhu
to get the high half of a full multiply will let you check if the high half is non-zero. That's the same as checking if the full result fits in the same width as the input operands.
(mulh
is a signed multiply, so in the general case you'd want to check that it's the sign-extension of the low half, to check for signed overflow if truncating the result to the width of the inputs. For a factorial with non-negative inputs, you could maybe check it for zero, or better just use unsigned to increase the value-range you can support.)
On a CPU with efficient multipliers, especially one that will fuse mulhu
/mul
into a single operation for a widening multiply ALU, the extra mulhu
is cheaper than anything else you might do, like count leading zeros of the inputs. Especially without extension-B, since baseline RISC-V left out many instructions that are common in other mainstream ISAs, such as bit-scan.