Search code examples
fixed-point

Fixed point multiplication of negative numbers


I have the following method to multiply two 32 bit numbers in fixed point 19.13 format. But I think there is a problem with this method:

1.5f is rounded up to 2.0f, while -1.5f is rounded up to -1.0f.

It seems to me that -1.5 should be rounded down to -2.0f.

First, does the current rounding make sense, and if not, how can I change it to be more consistent?

static OPJ_INT32 opj_int_fix_mul(OPJ_INT32 a, OPJ_INT32 b) {
    OPJ_INT64 temp = (OPJ_INT64) a * (OPJ_INT64) b ;
   temp += 4096;
   assert((temp >> 13) <= (OPJ_INT64)0x7FFFFFFF);
   assert((temp >> 13) >= (-(OPJ_INT64)0x7FFFFFFF - (OPJ_INT64)1));
   return (OPJ_INT32) (temp >> 13);
}

Solution

  • Since you are always adding 4096, code is doing rounding half-way cases toward positive infinity. It is kind of odd.

    To round toward positive infinity, I'd expect

    temp += 4096 + 4095;
    

    To round in the usual fashion (to nearest), use instead add a bias away from 0.

    temp += (temp < 0) ? -4096 : 4096;
    

    To round to nearest and ties to even is more work. Not certain OP desires that.