why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?

Does anyone know why the pmulhrsw instruction or

_mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768)

always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average.

Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased?

Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean):

__m128i ChooseSign(x, sign)
{
  return _mm_sign_epi16(x, sign)
}
signsDifferent = _mm_srai_epi16(_mm_xor_si128(a, b), 15)   // (a ^ b) >> 15
sign = _mm_andnot_si128(signsDifferent, a)    // !signsDifferent & a
//result = ChooseSign(sqrt(a * b), sign) * fraction   // biased
result = ChooseSign(sqrt(a * b) * fraction, sign)

Solution

A most serious mistake. I asked the same question on the Intel developer forums and andysem corrected me, pointing out the behavior is to round to the nearest integer.

I was mistaken into thinking it was biased because of the formula from MSDN

was (x * y + 16384) >> 15. This looked very similar to the int(x + 0.5) method for rounding, which I know is biased for negative #s and cringe at. But I didn't realize right shift for negative numbers isn't the same as dividing and casting to an int.

Plus, it wasn't matching my non-SIMD reference implementation, which turns out to be biased since I was calculating int(sum / 9.0f), which rounds towards zero.

I should've had more doubt before questioning the behavior of something implemented in hardware, which would've been rigorously vetted, since int(x + 0.5) would be a very expensive mistake.

_mm_mulhrs_epi16() still has some bias of always rounding x.5 towards +infinity. But that's not a big deal for my application.