Search code examples
assemblyoverflowriscvinteger-overflowriscv32

How to detect an overflow on assembler risc-v?


I am trying to implement a recursive factorial function in RISC-V assembly language that raises an error if there is an overflow. However, I am struggling to detect it. Is there a solution for this ?

`


Solution

  • Have a look at how GCC implements __builtin_smul_overflow or umul (https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html)

    unsigned checked_umul(unsigned a, unsigned b, bool *overflow_happened)
    {
        unsigned res;
        bool ovf = __builtin_umul_overflow(a,b, &res);
        if (ovf){
            *overflow_happened = 1;   // set or leave unmodified
                                      // check once at the end.
        }
        return res;
    }
    

    clang 16 -O2 for rv32gc on Godbolt

    checked_umul(unsigned int, unsigned int, bool*):
            mulhu   a3, a0, a1            # high half of a * b
            mul     a0, a0, a1            # result = a * b
            beqz    a3, .LBB1_2
            li      a1, 1
            sb      a1, 0(a2)             # just an example of something to branch on
    .LBB1_2:
            ret                          
    

    Of course you'd inline this in asm, not actually call a function; making it a function in C is just so we can look at the asm for it on its own, as in How to remove "noise" from GCC/clang assembly output?

    You can branch on the high half being non-zero and do whatever you like, or you could OR the high halves together across factorial iterations to just check for overflow at the end.

    (If you expect callers to often pass inputs that overflow, an early-out on it can save time, and make the worst case performance only about 13 multiplies, as 13! overflows a 32-bit integer. vs. a worst-case time of about 2^32-1 iterations with an early out. But otherwise just an or can be cheaper than branching every time. Of course if you cared about performance, you wouldn't be doing a recursive implementation; that adds a lot of overhead vs. a loop.)


    As Jester suggested, mulhu to get the high half of a full multiply will let you check if the high half is non-zero. That's the same as checking if the full result fits in the same width as the input operands.

    (mulh is a signed multiply, so in the general case you'd want to check that it's the sign-extension of the low half, to check for signed overflow if truncating the result to the width of the inputs. For a factorial with non-negative inputs, you could maybe check it for zero, or better just use unsigned to increase the value-range you can support.)

    On a CPU with efficient multipliers, especially one that will fuse mulhu/mul into a single operation for a widening multiply ALU, the extra mulhu is cheaper than anything else you might do, like count leading zeros of the inputs. Especially without extension-B, since baseline RISC-V left out many instructions that are common in other mainstream ISAs, such as bit-scan.