Search code examples
c++bit-manipulationmultiplicationfixed-pointinteger-division

Emulated Fixed Point Division/Multiplication


I'm writing a Fixedpoint class, but have ran into bit of a snag... The multiplication, division portions, I am not sure how to emulate. I took a very rough stab at the division operator but I am sure it's wrong. Here's what it looks like so far:

class Fixed
{
    Fixed(short int _value, short int _part) : 
        value(long(_value + (_part >> 8))), part(long(_part & 0x0000FFFF)) {};

    ...

    inline Fixed operator -() const  // example of some of the bitwise it's doing
    {
        return Fixed(-value - 1, (~part)&0x0000FFFF);
    };

    ...

    inline Fixed operator / (const Fixed & arg) const // example of how I'm probably doing it wrong
    {
        long int tempInt = value<<8 | part;
        long int tempPart = tempInt;
        tempInt  /= arg.value<<8 | arg.part;
        tempPart %= arg.value<<8 | arg.part;
        return Fixed(tempInt, tempPart);
    };

    long int value, part; // members
};

I... am not a very good programmer, haha!

The class's part is 16 bits wide (but expressed as a 32-bit long since I imagine it'd need the room for possible overflows before they're fixed) and the same goes for value which is the integer part. When the 'part' goes over 0xFFFF in one of it's operations, the highest 16 bits are added to 'value', and then the part is masked so only it's lowest 16 bits remain. That's done in the init list.

I hate to ask, but if anyone would know where I could find documentation for something like this, or even just the 'trick' or how to do those two operators, I would be very happy for it! I am a dimwit when it comes to math, and I know someone has had to do/ask this before, but searching google has for once not taken me to the promised land...


Solution

  • As Jan says, use a single integer. Since it looks like you're specifying 16 bit integer and fractional parts, you could do this with a plain 32 bit integer.

    The "trick" is to realise what happens to the "format" of the number when you do operations on it. Your format would be described as 16.16. When you add or subtract, the format stays the same. When you multiply, you get 32.32 -- So you need a 64 bit temporary value for the result. Then you do a >>16 shift to get down to 48.16 format, then take the bottom 32 bits to get your answer in 16.16.

    I'm a little rusty on the division -- In DSP, where I learned this stuff, we avoided (expensive) division wherever possible!