Search code examples
c++csignal-processingfixed-point

Fixed point power function


I have question regarding how to handle some fixed point calculations. I can't figure out how to solve it. I know it is easy in floating point, but i want to figure out how to do it in fixed point.

I have a fixed point system where i am performing the following equation on a signal (vSignal):

Signal_amplified = vSignal * 10^Exp

The vSignal has an max amplitude of around 4e+05,

The system allows for representation of 2.1475e+09 (32 bit) signals. So there is some headroom for Signal_amplified.

For simplicity reason, lest just assume Exp can go from 0 to 10.

Lets say the first value is 2.8928. This value works well when calculating in floating point, since the expresson 10^2.8928 results in 781. When using a rounded floating point value 781 i get signal amplitudes of 3.0085e+08, well within the signal range.

If i try to represent the value 2.8928 with a Q format of, lets say Q12. The value changes to 11849. Now 10^11849 results in overflow.

How should one handle these large numbers?? I Could use another formatting like Q4, but even then the numbers get very large and my becomes poor. I would very much like to be able to calculate with a precision of .001, but i just can see how this should be done.

Minimal Working Example:

int vSignal = 400000

// Floatingpoint -> Goes well
double dExp = 2.89285
double dSignal_amplified = vSignal * std::pow(10,dExp)

// Fixedpoint -> Overflow
int iExp = 11848 // Q12 format
int iSignal_amplified = vSignal * std::pow(10,iExp)
iSignal_amplified =  iSignal_amplified>>12

Any ideas?


Solution

  • "If i try to represent the value 2.8928 with a Q format of, lets say Q12. The value changes to 11849. Now 10^11849 results in overflow.".

    Mixed-type math is pretty hard, and it looks like you should avoid it. What you want is pow(Q12(10.0), Q12(2.8928)) or possibly an optimized pow10(Q12(2.8928)). For the first, see my previous answer. The latter can be optimized by a hardcoded table of powers. pow10(2.8928) is of course pow10(2) * pow10(.5) * pow10(.25) * pow10(.125) * ... - each 1 in the binary representation of 2.8928 corresponds to a single table entry. You may want to calculate the intermediate results in Q19.44 and drop the lowest 32 bits when you return..

    Edit: Precision

    Storing all the values of pow10(2^-n) up to n=12 has the slight problem that the result is close to 1, namely 1.000562312. If you'd store that as a Q12, you lose precision in rounding. Instead, it may be wise to store the value of pow10(2^-12) as a Q24, the value of pow10(2^-121) as a Q23 etc. Now evaluate Q12 pow10(Q12 exp) starting at the LSB of exp, not the MSB. You need to repeatedly shift the intermediate results as you move up to pow10(0.5) but half of the time you can merge that with the >>12 that's inherent to Q12 multiplication.