Search code examples
assemblyx86gnu-assemblerdivide-by-zeroi386

GAS assembly snippet divides by 0, not sure why


I have the following function, involving a snippet of i386 assembly in GAS syntax:

inline int MulDivRound(
    int nNumber,
    int nNumerator,
    int nDenominator )
{
    int nRet, nMod;

    __asm__ __volatile__ (
        "mov    %2,     %%eax   \n"
        "mull   %3              \n"
        "divl   %4              \n"
        "mov    %%eax,  %0      \n"
        "mov    %%edx,  %1      \n"

        :   "=m"    (nRet),
            "=m"    (nMod)
        :   "m"     (nNumber),
            "m"     (nNumerator),
            "m"     (nDenominator)
        :   "eax", "edx"
    );

    return nRet + nMod*2 / nDenominator;
}

I've noticed that, in a few instances, I'm getting an EXC_I386_DIV crash with this function. The following call produces such a crash:

int res = MulDivRound( 4096, -566, 400 );

I can't see clearly what is happening to cause this function to divide by 0: surely it just moves 4096 to eax, then multiplies that by -566, then divides that by 400, returning the two components of the result of the division operation. Can anyone shed some light on this?


Solution

  • division / multiplication instructions in x86 ... there's a few things wrong in this code:

    You're using signed operands with the unsigned mul / div operations. The operations that you really perform therefore are:

    1. the signed -566 (0xfffffdca as 2-complement 32bit) is interpreted as unsigned 4294958538
    2. this is multiplied by 4096 resulting in 17592183726080 (0xfff:0xffdca000 in EDX:EAX). Notice the lower 32bit of that convert to -2318336 as you "expect"
    3. The full 64bit value is divided by 400 but due to the fact that the upper 32bit are 0xfff, 4095), the result exceeds UINT32_MAX and the exception is raised.

    If you clear the upper 32bits by inserting an xor %%edx,%%edx before the divl, the operation will succeed but it'll return you something you don't expect - namely, it divides 0xffdca000 (4292648960) by 400 resulting in 0xa3c066 (10731622) in EAX and the remainder, 0xa0 (160) in EDX.

    That's "correct" as far as what you instructed the machine to do, but not what you expect. If you want to use signed numbers, you need imul / idiv instead.

    The assembly can ultimately be simplified into the following:

    __asm__ __volatile__ (
        "imull   %3              \n"
        "idivl   %4              \n"
        :   "=a"    (nRet),
            "=&d"   (nMod)
        :   "a"     (nNumber),
            "mr"    (nNumerator),
            "mr"    (nDenominator)
        :   "cc"
    );
    

    That's because gcc allows to specify which registers to use as input / output, so no data moves are necessary at all here. Also, the "m" constraint alone creates suboptimal code on 64bit as it forces the arguments onto the stack; give it an alternative and the generated code will be better.

    Edit: just changed the nMod constraint to "=&d"(nMod); it needs to be what gcc calls an "early clobber". This means that the specified output register is overwritten before all input operands are consumed/used, and tells the compiler not to pass inputs (the (nDenominator), in particular) in EDX. Otherwise, were that to happen, it would cause an "interesting" failure mode. This is not an issue if you only use the "m" for nNumerator/nDenominator but once registers are allowed, one better be careful.

    Edit2: Also note that the above code isn't proof against overflow exceptions of course. You can still call it like MulDivRound(INT32_MAX, 4, 2) to trigger those. Legitimately / by the way these instructions are designed. If you must make sure that doesn't happen, you've got to add code which compares the denominator against EDX/RDX before the [i]div and handle the case where it's smaller.