ARM CMSIS giving wrong output for q15 FFT

I am programming an STM32 cortex M4 microcontroller. I made a simple circuit to offset a sinusoid signal so that it varies (at its maximum output) from o to 3v3, basically I built a circuit to give the sinusoidal signal a DC offset because the mirocontroller's ADC does not support negative voltages.

I am sampling at a frequency of approximately 10kHz and I programmed the DMA to do 128 acquisitions before generating an interruption so that I can treat my data.

What I want to do

Perform the 128 point FFT on my signal and then some other calculations (RMS value, power, etc). Basically I have two signals, one for voltage and one for current and I want to do all the electrical calculations using the FFT. I implemented this in python and it works.

What I did

RMS value of an FFT of a signal is

import numpy as np
RMS = np.sqrt(np.sum(signal.real * signal.real + signal.imag * signal.imag))/N

where N is the number of data points you are using and signal is any signal ... in order to ignore the DC offset you have to do the following

RMS = np.sqrt(np.sum(signal[1:].real * signal[1:].real + signal[1:].imag * signal[1:].imag))/N

Here is my C code using the CMSIS DSP library

q15_t adcData[256] = {0}; /*dma reads 128 points into this buffer*/

q31_t auxRms = 0;
q31_t auxReal = 0;
q31_t auxImag = 0;

q31_t sqrt = 0;
float result = 0.0F;
int i;

arm_cfft_q15(&arm_cfft_sR_q15_len128, adcData, 0, 1);
for(i = 1; i < N; i++) {
    auxReal = adcData[2*i];
    auxImag = adcData[2*i+1];
    auxReal = auxReal * auxReal;
    auxImag = auxImag * auxImag;
    auxRms += auxImag + auxReal;
}

arm_sqrt_q31(interim_rms, &sqrt);
arm_q31_to_float(&sqrt, &result, 1);

The same thing as in python, square the real value, square the imaginary value sum them and accumulate, but since this is q15 math, I did the q15 square root and the transformed it to float, which gives me an output between -1 and 1 but this does not work

It doesn't matter if I insert a signal with its maximum output or if I leave just the DC signal, the result is basically the same, which is wrong ... very wrong.

In order to test if this was working I created my own sine wave in the microcontroller and the I did the same to it. Here's my code:

float var = 0.0F;
float fbuff[256] = {0};
for(i = 0; i < 128; i++) {
    var = 0.03125 * arm_sin_f32(2 * PI * i / 128);
    fbuff[i] = var;
    arm_float_to_q15(&var, &buff[i], 1);
    buff[i] += 2048;
}

With this I get a sine wave that imitates the ADC gicing its maximum value for the signal, since it goes from 0 to 4095 (it's a 12 bit ADC) and the DC offset is 2048. The only difference is that I took the same created signal and did the same calculations and also implemented the same function in python and compared the results.

IN python the RMS value was 0.02209708691207961 In the float calculations I got 0.176776692, which is correct ... because I need to get this value, divide it by 128 multiply it by 8 and then multiply it by 2. This is because the given value of the cfft needs to be dampened (probably not the correct term) by N samples (this case 128, but if using a 64 sample FFt then 64 and so on) which gives me the exact same value as the python function. In the q15 function I got 0.489307404 which is about 22 times greater than the expected value so ... obviously wrong. Even when I compare the output of the float FFT with the q15 FFT of the "perfect" sinusoidal signal it seems wrong ... to notice this I checked the float FFT result and the q15 FFT result as well as the python FFt result.

What else have I tried?

Since the above failed I tried the following. Instead of doing my own RMS calculation I used the arm_rms_q15 function which should receive a 128 samples q15 buffer and output a q15 RMS value, which only gives me a big fat 0 (certainly due to saturation instructions) and I also tried converting my entire buffer into a float buffer using the arm_q15_to_float(&src, dst, size) and the doing the FFT BUT the resulting buffer is obviously wrong.

I have recently tested this again .. I created the float sinusoidal wave and then converted it into q31 and did the FFT and the result was good, it was dampened by a factor of 128 in comparison to the float output but the output was the correct value. I even tried creating the sine wave in floating point, convert it to q15 and then convert it to q31 from the q15 value and the result of the q31 FFT was the same. So I can only assume there is a bug in the q15 implementation of the FFT, which I might have introduced myself when I changed PCs, since a few months ago I was able to use the q15 FFT and the results were OK (yeah I know ... I forgot to mention something as important as that, but I honestly just remembered it). So I'll get to looking for this problem again on monday and see if there is a bug in the "new" CMSIS library I'm using (I might have changed versions when I changed PCs) or if I might have changed a configuration accidentally ... but please if anyone might know what is wrong please let me know before I go bug hunting ... it will certainly be faster this way.

An example of input data and FFT result for the "perfect sinusoidal", the first one is the sample array and the second one is the FFT array using the q15 algorithm

The values of my created signal are

[ 02048  02098  02148  02198  02248  02297  02345  02393
 02440  02486  02531  02574  02617  02658  02698  02736
 02772  02807  02840  02870  02899  02926  02951  02974
 02994  03012  03028  03041  03052  03061  03067  03071
 03072  03071  03067  03061  03052  03041  03028  03012
 02994  02974  02951  02926  02899  02870  02840  02807
 02772  02736  02698  02658  02617  02574  02531  02486
 02440  02393  02345  02297  02248  02198  02148  02098
 02048  01998  01948  01898  01848  01799  01751  01703
 01656  01610  01565  01522  01479  01438  01398  01360
 01324  01289  01256  01226  01197  01170  01145  01122
 01102  01084  01068  01055  01044  01035  01029  01025
 01024  01025  01029  01035  01044  01055  01068  01084
 01102  01122  01145  01170  01197  01226  01256  01289
 01324  01360  01398  01438  01479  01522  01565  01610
 01656  01703  01751  01799  01848  01898  01948  01998
]

and the resulting FFT is

[  01020  01020  00872 -00424  00252 -00248  00108 -00334
-00002  00000  00116 -00148 -00004 -00004  00092 -00094
 00000  00000  00078 -00066 -00002  00000  00066 -00050
 00000  00000  00058 -00040 -00002 -00002  00052 -00030
 00000  00000  00048 -00026 -00002  00000  00044 -00020
-00002  00000  00040 -00016  00000  00000  00038 -00012
 00000  00000  00036 -00010 -00002 -00002  00034 -00008
 00000  00000  00032 -00006 -00002  00000  00030 -00004
 00000  00000  00030  00000  00000 -00002  00028  00002
-00002  00000  00028  00002  00000  00000  00026  00004
 00000  00000  00026  00006  00000  00000  00024  00006
 00000  00000  00022  00008  00000  00002  00022  00008
 00000  00000  00022  00008 -00002 -00002  00020  00010
-00002  00000  00018  00010  00000  00000  00018  00012
 00000  00000  00018  00012  00000  00000  00016  00014
 00000  00000  00016  00014  00000  00000  00016  00014
 00000  00000  00016  00016  00000  00000  00014  00018
-00002  00000  00014  00018  00000  00000  00012  00018
 00000  00000  00012  00020  00000  00000  00010  00020
 00000  00000  00008  00022  00000 -00002  00008  00020
 00000  00000  00008  00022  00000  00000  00006  00022
-00002  00000  00004  00024  00000  00000  00004  00024
 00000  00000  00004  00026 -00002  00000  00002  00026
 00000  00000  00000  00028 -00002  00000  00000  00030
 00000  00000 -00002  00032  00000 -00002 -00004  00032
-00002  00000 -00006  00036  00000  00000 -00010  00036
 00000  00000 -00012  00040  00000 -00002 -00014  00042
 00000  00000 -00020  00046  00000  00000 -00024  00048
 00000  00000 -00030  00052 -00002  00000 -00038  00060
-00002  00000 -00050  00066  00000  00000 -00066  00078
 00000  00000 -00094  00092 -00002  00000 -00150  00114
 00000  00000 -00342  00102 -00256  00268 -00414  00884]

EDIT:

I forgot to mention that the ADC is inputting valid data, I get a nice sinusoidal wave ranging from 500 to 3500 approximately. But this is not that important since it is possible to see that the FFT q15 algorithm is not working even in the "perfect wave"

Solution

Ok, sorry it took me such a long time to post my solution to the problem.

I actually had 2 problems.

One of them was that I was using a very small input, and it worked for the software generated sine wave because it was juuuust inside the minimum value. By minimum value I mean that there is a dampening of the inputs in order to not cause an overflow, in the case of 128 data points, the inputs were practically zero most of the time, thus the wrong output. This means that 12 bits is probably not enough resolution, I would recommend shifting the ADC register value 4 bits to the left (multiply by 16) and then use that value to do the calculations, or just left align the ADC register. Granted this will give you an inverted sine wave, but you can just flip the 15th bit, or just work with the wave as is and invert the results (if necessary ... for example RMS value will remain the same, but some other calculations may be inverted).

The second problem I was facing is that I was not using a complex array as an input to the FFT (index0 = ADc data shifted to the left, index 1 = zero, index2 = ADC data shifted to the left, ... and so on), just as @J.R.Schweitzer stated in his comment.

My solution: Use the q31 FFT with the data shifted 19 bits to the left and a complex input buffer as mentioned in the precious paragraph. I didn't notice any difference in speed and the q31 output gives me a better result.

Hopefully this will help someone elese in the future