Search code examples
c++fixed-point

How to connect the theory of fixed-point numbers and its practical implementation?


The theory of fixed-point number is that we divide certain number of bits between integer part and fractional part. This amount is fixed.

For example, 26.5 is stored in that order:
power of 2 representation


To convert from floating-point to fixed-point, we follow this algorithm:

  • Calculate x = floating_input * 2^(fractional_bits)

    27.3 * 2^10 = 27955.2
    
  • Round x to the nearest whole number (e.g. round(x))

    27955
    
  • Store the rounded x in an integer container


Now if we look on the bit representation of our numbers and on what multiplying on 2^(fractional_bits) makes, we will see:

27 is 11011
27*2^10 is 110 1100 0000 0000    which is shifting on 10 bits to the left.

So we can say, that multiplying on 2^10 indeed gives us "space" in the right part of bits for save forth altering of this number. We can make two such numbers converted in this way, interacting each other and eventually re-converted to familiar view with point by opposite dividing on 2^10.

If we recall that bits are stored in some integer variable, which in turn has its own amount of bits it gets clear that as more bits in that variable are devoted for fraction part as less bits remain for integer part of number.

27.3 * 2^10 = 27955.2 should be rounded for storing in integer type to
27955 which is 110 1101 0011 0011

after that number can be altered somehow, certain value isn't important now, and let's say, we want to retrieve back human-readable value:

27955/2^10 = 27,2998046875

What about amount of bits after point? Let's say we have two numbers with purpose to multiply them and we chose 10 bits after point

27 * 3.3 = 89.1 expected
27*2^10 = 27 648 is 110 1100 0000 0000
3.3*2^10 = 3 379 is     1101 0011 0011
27 648 * 3 379 = 93 422 592
consequently 
27*3.3 = 93 422 592/(2^10*2^10) = 89.09 pretty accurate

Let's take 1 bit after point

27 and 3.3
27*2^1 = 54 is 110110
3.3*2^1 = 6.6 after round 6 is 110
54 * 6 = 324
consequently 
27*3.3 = 324/(2^1*2^1) = 81 which is unsatisfying

On practice we can use next code to create and operate with fixed-point number:

#include <iostream>

using namespace std;

const int scale = 10;

#define DoubleToFixed(x) (x*(double)(1<<scale))
#define FixedToDouble(x) ((double)x / (double)(1<<scale))
#define IntToFixed(x) (x << scale)
#define FixedToInt(x) (x >> scale)
#define MUL(x,y) (((x)*(y)) >> scale)
#define DIV(x,y) ((x) << scale)

int main()
{
double a = 7.27;
double b = 3.0;

int f = DoubleToFixed(a);
cout << f<<endl;                  //7444
cout << FixedToDouble(f)<<endl;   //7.26953125

int g = DoubleToFixed(b);
cout << g<<endl;                  //3072
int c = MUL(f,g);
cout << FixedToDouble(c)<<endl;   //21.80859375

}

So, where is connection between the theory of fixed emplacement of point between bits (powers of 2) and practice implementation? If we store fixed-number in int, it is obvious, that there is no place for storing the point in it.

It seems that fixed-point numbers are just conversion for increase performance. And to retrieve human-readable number after calculations, the opposite conversion must present.

I hope, I understand the algorithm. But is the idea of placement of point between digits is just an abstract idea?


Solution

  • Fixed-point formats are used as a way to represent fractional numbers. Quite commonly, processors perform fixed-point or integer arithmetic faster or more efficiently than floating-point arithmetic. Whether fixed-point arithmetic is suitable for an application depends on what numbers the application needs to work with.

    Using fixed-point formats does require converting input to the fixed-point format and converting numbers in the fixed-point format to output. But this is also true of integers and floating-point: All input must be converted to whatever internal format is used to represent it, and all output must be produced by converting from internal formats.

    And how does multiplying on 2^(fractional_bits) affect the quantity of digits after the point?

    Suppose we have some number x that is represented as an integer X = x•2f, where f is the number of fraction bits. Conceptually X is in a fixed-point format. Similarly, we have y represented as Y = y•2f.

    If we execute an integer multiplication instruction to produce result Z = XY, then Z = XY = (x•2f)•(y•2f) = xy•22f. Then, if we divide Z by 2f (or, nearly equivalently, shift it right by f bits), we have xy•2f except for any rounding errors that may have occurred in the division. And xy•2f is the fixed-point representation of the product of x and y.

    Thus, we can effect a fixed-point multiplication by performing an integer multiplication followed by a shift.

    Often, to get rounding instead of truncation, a value of half of 2f is added before the shift, so we compute floor((XY + 2f−1) / 2f):

    • Multiply X by Y.
    • Add 2f−1.
    • Shift right f bits.