I have the following function, involving a snippet of i386 assembly in GAS syntax:
inline int MulDivRound(
int nNumber,
int nNumerator,
int nDenominator )
{
int nRet, nMod;
__asm__ __volatile__ (
"mov %2, %%eax \n"
"mull %3 \n"
"divl %4 \n"
"mov %%eax, %0 \n"
"mov %%edx, %1 \n"
: "=m" (nRet),
"=m" (nMod)
: "m" (nNumber),
"m" (nNumerator),
"m" (nDenominator)
: "eax", "edx"
);
return nRet + nMod*2 / nDenominator;
}
I've noticed that, in a few instances, I'm getting an EXC_I386_DIV
crash with this function. The following call produces such a crash:
int res = MulDivRound( 4096, -566, 400 );
I can't see clearly what is happening to cause this function to divide by 0: surely it just moves 4096 to eax
, then multiplies that by -566, then divides that by 400, returning the two components of the result of the division operation. Can anyone shed some light on this?
division / multiplication instructions in x86 ... there's a few things wrong in this code:
You're using signed operands with the unsigned mul
/ div
operations. The operations that you really perform therefore are:
-566
(0xfffffdca
as 2-complement 32bit) is interpreted as unsigned 4294958538
4096
resulting in 17592183726080
(0xfff:0xffdca000
in EDX:EAX
). Notice the lower 32bit of that convert to -2318336
as you "expect"400
but due to the fact that the upper 32bit are 0xfff
, 4095
), the result exceeds UINT32_MAX
and the exception is raised.If you clear the upper 32bits by inserting an xor %%edx,%%edx
before the divl
, the operation will succeed but it'll return you something you don't expect - namely, it divides 0xffdca000
(4292648960
) by 400
resulting in 0xa3c066
(10731622
) in EAX
and the remainder, 0xa0
(160
) in EDX
.
That's "correct" as far as what you instructed the machine to do, but not what you expect. If you want to use signed numbers, you need imul
/ idiv
instead.
The assembly can ultimately be simplified into the following:
__asm__ __volatile__ (
"imull %3 \n"
"idivl %4 \n"
: "=a" (nRet),
"=&d" (nMod)
: "a" (nNumber),
"mr" (nNumerator),
"mr" (nDenominator)
: "cc"
);
That's because gcc allows to specify which registers to use as input / output, so no data moves are necessary at all here. Also, the "m"
constraint alone creates suboptimal code on 64bit as it forces the arguments onto the stack; give it an alternative and the generated code will be better.
Edit: just changed the nMod
constraint to "=&d"(nMod)
; it needs to be what gcc calls an "early clobber". This means that the specified output register is overwritten before all input operands are consumed/used, and tells the compiler not to pass inputs (the (nDenominator)
, in particular) in EDX
. Otherwise, were that to happen, it would cause an "interesting" failure mode. This is not an issue if you only use the "m"
for nNumerator
/nDenominator
but once registers are allowed, one better be careful.
Edit2: Also note that the above code isn't proof against overflow exceptions of course. You can still call it like MulDivRound(INT32_MAX, 4, 2)
to trigger those. Legitimately / by the way these instructions are designed. If you must make sure that doesn't happen, you've got to add code which compares the denominator against EDX
/RDX
before the [i]div
and handle the case where it's smaller.