Search code examples
arduinofixed-point

Arduino Variable Size and Fixed Point


I am working on a project where I need to use Fixed Point math, I am not able to figure out why the numbers are "Rolling Over", I was able to get a large enough number when I changed the shift amount from 16 to 8 and finally to 4. Here is the code I am using at present:

#define SHIFT_AMOUNT 8
#define SHIFT_MASK ((1 << SHIFT_AMOUNT) - 1)
#define FIXED_ONE (1 << SHIFT_AMOUNT)
#define INT2FIXED(x) ((x) << SHIFT_AMOUNT)
#define FLOAT2FIXED(x) ((int)((x) * (1 << SHIFT_AMOUNT)))
#define FIXED2INT(x) ((x) >> SHIFT_AMOUNT)
#define FIXED2FLOAT(x) (((float)(x)) / (1 << SHIFT_AMOUNT))

int32_t test = FLOAT2FIXED(1.00);

void setup()
{
   Serial.begin(57600);
}

void loop(){

   test += FLOAT2FIXED(1.00);
   Serial.println(FIXED2FLOAT(test));

}

And the output:

1
2
3

...

127
-128
-127
-126

When SHIFT_AMOUNT = 8 I am only able to store variables from -128 to 128 but since I am using a 32 bit variable shouldn't a 16 bit shift move the decimal point to the "Middle" leaving 2 16 bit sections, one for the Whole Number and the other for the decimals? Shouldn't the whole range of the int32_t be −2,147,483,648 to 2,147,483,647 with the shift at 16? Is there a setting that I am missing or am I just way off with how this works?

If SHIFT_AMOUNT = 4 I get a range that I need but this doesn't seem right since all the other examples that I have seen online use the 16 bit shift.

Here is a link showing what I am looking to do

EDIT

If I have this correctly, when shifting 8 bits when using a 16 bit wide type that leaves 8bits for the whole and 8 for the fractal leaving a range of -128 to 128. Hence the need for using the 4bit shift increasing the range of the whole to -32,768 to 32,767 is this correct? If that is right then is the int32_t not a true 32 bit wide?

EDIT2

Patrick Trentin pointed out where I was going wrong. Everything was correct except for the part I copied from the linked question. I was casting to a int not a int32_t. The int type is 16bits wide, hence having to use 4 to get the range I needed.


Solution

  • Change this:

    #define FLOAT2FIXED(x) ((int)((x) * (1 << SHIFT_AMOUNT)))
    

    into this:

    #define FLOAT2FIXED(x) ((int32_t)((x) * (1 << SHIFT_AMOUNT)))
    

    Rationale: the size of int is 16-bit on an Arduino Uno (see the documentation), this caps the size of the values that you are storing within your int32_t variable to 16 bits.


    EDIT:

    The fact that int16_t is an alias of signed int, which is an alias for int, can be corroborated by either looking at the online documentation or at the content of the file

    arduino-version/hardware/tools/avr/lib/avr/include/stdint.h

    among the Arduino Uno sources:

    /** \ingroup avr_stdint
        16-bit signed type. */
    
    typedef signed int int16_t;