I'd like to start out by saying this isn't about optimizations so please refrain from dragging this topic down that path. My purpose for using fixed point arithmetic is because I want to control the precision of my calculations without using floating point.
With that being said let's move on. I wanted to have 17 bits for range and 15 bits for the fractional part. The extra bit is for the signed value. Here are some macros below.
const int scl = 18;
#define Double2Fix(x) ((x) * (double)(1 << scl))
#define Float2Fix(x) ((x) * (float)(1 << scl))
#define Fix2Double(x) ((double)(x) / (1 << scl))
#define Fix2Float(x) ((float)(x) / (1 << scl))
Addition and subtraction are fairly straight forward but things gets a bit tricky with mul and div.
I've seen two different ways to handle these two types of operations. 1) if I am using 32 bits then use a temp 64bit variable to store intermediate multiplication steps then scale at the end.
2) right in the multiplication step scale both variables to a lesser bit range before multiplication. For example if you have a 32 bit register with 16 bits for the whole number you could shift like this:
(((a)>>8)*((b)>>6) >> 2) or some combination that makes sense for you app.
It seems to me that if you design your fixed point math around 32 bits it might be impractical to always depend on having a 64bit variable able to store your intermediate values but on the other hand shifting to a lower scale will seriously reduce your range and precision.
questions Since i'd like to avoid trying to force the cpu to try to create a 64bit type in the middle of my calculations is the shifting to lower bit values the only other alternative?
Also i've notice
int b = Double2Fix(9.1234567890);
printf("double shift:%f\n",Fix2Double(b));
int c = Float2Fix(9.1234567890);
printf("float shift:%f\n",Fix2Float(c));
double shift:9.123444
float shift:9.123444
Is that precision loss just a part of using fixed point numbers?
Since i'd like to avoid trying to force the cpu to try to create a 64bit type in the middle of my calculations is the shifting to lower bit values the only other alternative?
You have to work with the hardware capabilities, and the only available operations you'll find are:
If the instruction set has #3, and the CPU implements it efficiently, then there's no need to worry about the extra-wide result it produces. For x86, you can pretty much take these as a given. Anyway, you said this wasn't an optimization question :) .
Sticking to just #1, you'll need to break the operands into pieces of (N/2) bits and do long multiplication, which is likely to generate more work. There are still cases where it's the right thing to do, for instance implementing #3 (software extended arithmetic) on a CPU that doesn't have it or #2.
Is that precision loss just a part of using fixed point numbers?
log2( 9.1234567890 – 9.123444 ) = –16.25, and you used 16 bits of precision, so yep, that's very typical.