fprem1 doesn't spit out the result I expected

SO I was given to understand that fprem1 do ST0 / ST1 and put remainder on ST0 (so replacing it )

but when I did something like this

mov dword ptr [402000],2
mov dword ptr [402004],3
fild dword ptr [402000]
fild dword ptr [402004]
freem1

I get -1 remainder when I'm expecting it should be 1.

Solution

TL:DR: Use fprem, not fprem1, to get the behaviour you're expecting. Or better, do it with SSE2 instead of messing around with obsolete x87.

fprem implements the fmod() IEEE / ISO C standard function, while fprem1 implements the remainder() standard function.

fprem1 is doing exactly what the instruction reference manual entry for it says it should. (See also the x86 tag wiki for links to Intel's official PDF). Condensed quote:

The remainder represents the following value:

Remainder ← ST(0) − (Q ∗ ST(1))

Here, Q is an integer value that is obtained by rounding the floating-point number quotient of [ST(0) / ST(1)] toward the nearest integer value. The magnitude of the remainder is less than or equal to half the magnitude of the modulus (i.e. ST(1))

The table of results also confirms that of two positive inputs (+F) can give a positive or negative result, or positive zero. (+/-F or +0).

inputs: st0=3  st1=2
3/2 = 1.5  
Round to nearest(1.5): Q = 2.0  
Remainder = 3 - 2 * 2 = -1

You're expecting it to work like the integer modulus operator, where the division result is truncated towards zero, rather than rounded to nearest. That's what fprem does, not fprem1.

fprem (not fprem1):
... The sign of the remainder is the same as the sign of the dividend.

Also note that x87 is obsolete, and in new code it's usually best to use SSE2. e.g.

mov       eax, 2
cvtsi2sd  xmm2, eax
mov       eax, 3
cvtsi2sd  xmm3, eax
 ; or just accept them as function args in registers

; x=2 in xmm2.   y=3 in xmm3
movaps    xmm0, xmm3    ; save a copy of y
divsd     xmm3, xmm2    ; y/x = 3/2 = 1.5
roundsd   xmm1, xmm3, 0 ; SSE4.1  round to nearest integer.
mulsd     xmm1, xmm2    ; Q * divisor
subsd     xmm0, xmm1    ; dividend - (Q * divisor)

; xmm0 = y mod x   (fprem style, not fprem1)
; xmm3 = y/x

I forget what gcc does for nearbyint(x) when SSE4.1 roundsd isn't available, but check on that (with -ffast-math) for an SSE2 fallback. Something like converting to/from integer could work if you know the range is limited.