Does anyone know why the pmulhrsw
instruction or
_mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768)
always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average.
Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased?
Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean):
__m128i ChooseSign(x, sign)
{
return _mm_sign_epi16(x, sign)
}
signsDifferent = _mm_srai_epi16(_mm_xor_si128(a, b), 15) // (a ^ b) >> 15
sign = _mm_andnot_si128(signsDifferent, a) // !signsDifferent & a
//result = ChooseSign(sqrt(a * b), sign) * fraction // biased
result = ChooseSign(sqrt(a * b) * fraction, sign)
A most serious mistake. I asked the same question on the Intel developer forums and andysem corrected me, pointing out the behavior is to round to the nearest integer.
I was mistaken into thinking it was biased because of the formula from MSDN
was (x * y + 16384) >> 15
. This looked very similar to the int(x + 0.5)
method for rounding, which I know is biased for negative #s and cringe at. But I didn't realize right shift for negative numbers isn't the same as dividing and casting to an int.
Plus, it wasn't matching my non-SIMD reference implementation, which turns out to be biased since I was calculating int(sum / 9.0f)
, which rounds towards zero.
I should've had more doubt before questioning the behavior of something implemented in hardware, which would've been rigorously vetted, since int(x + 0.5)
would be a very expensive mistake.
_mm_mulhrs_epi16()
still has some bias of always rounding x.5 towards +infinity
. But that's not a big deal for my application.