I am working on a project where I need to use Fixed Point math, I am not able to figure out why the numbers are "Rolling Over", I was able to get a large enough number when I changed the shift amount from 16 to 8 and finally to 4. Here is the code I am using at present:
#define SHIFT_AMOUNT 8
#define SHIFT_MASK ((1 << SHIFT_AMOUNT) - 1)
#define FIXED_ONE (1 << SHIFT_AMOUNT)
#define INT2FIXED(x) ((x) << SHIFT_AMOUNT)
#define FLOAT2FIXED(x) ((int)((x) * (1 << SHIFT_AMOUNT)))
#define FIXED2INT(x) ((x) >> SHIFT_AMOUNT)
#define FIXED2FLOAT(x) (((float)(x)) / (1 << SHIFT_AMOUNT))
int32_t test = FLOAT2FIXED(1.00);
void setup()
{
Serial.begin(57600);
}
void loop(){
test += FLOAT2FIXED(1.00);
Serial.println(FIXED2FLOAT(test));
}
And the output:
1
2
3
...
127
-128
-127
-126
When SHIFT_AMOUNT = 8 I am only able to store variables from -128 to 128 but since I am using a 32 bit variable shouldn't a 16 bit shift move the decimal point to the "Middle" leaving 2 16 bit sections, one for the Whole Number and the other for the decimals? Shouldn't the whole range of the int32_t be −2,147,483,648 to 2,147,483,647 with the shift at 16? Is there a setting that I am missing or am I just way off with how this works?
If SHIFT_AMOUNT = 4 I get a range that I need but this doesn't seem right since all the other examples that I have seen online use the 16 bit shift.
Here is a link showing what I am looking to do
If I have this correctly, when shifting 8 bits when using a 16 bit wide type that leaves 8bits for the whole and 8 for the fractal leaving a range of -128 to 128. Hence the need for using the 4bit shift increasing the range of the whole to -32,768 to 32,767 is this correct? If that is right then is the int32_t not a true 32 bit wide?
Patrick Trentin pointed out where I was going wrong. Everything was correct except for the part I copied from the linked question. I was casting to a int not a int32_t. The int type is 16bits wide, hence having to use 4 to get the range I needed.
Change this:
#define FLOAT2FIXED(x) ((int)((x) * (1 << SHIFT_AMOUNT)))
into this:
#define FLOAT2FIXED(x) ((int32_t)((x) * (1 << SHIFT_AMOUNT)))
Rationale: the size of int
is 16-bit
on an Arduino Uno (see the documentation), this caps the size of the values that you are storing within your int32_t
variable to 16 bits.
EDIT:
The fact that int16_t
is an alias of signed int
, which is an alias for int
, can be corroborated by either looking at the online documentation or at the content of the file
arduino-version/hardware/tools/avr/lib/avr/include/stdint.h
among the Arduino Uno sources:
/** \ingroup avr_stdint
16-bit signed type. */
typedef signed int int16_t;