c real-time arduino-uno cross-correlation isr

How to do Fast Cross-Correlation in Arduino (Part of Real Time Instant Signal Recognizer)

I'm having a hard time to write a code which samples a signal (in pin A0) and does a Real-Time Instant CrossCorrelation with another known signal (which is saved in the flash memory of the Arduino Uno).

My problem is that my code (my equation) includes lots of loops which makes it hard on the arduino to do it in real time and instantly after sampling a signal (using ISR(TIMER1_COMPA_vect)).

Is there any idea of how to implement the cross correlation so it wont take lots of time each sample?

Cross Correlation Equation:

for the record, here is what I have done already:

#define NUM_OF_SAMPLES 64
#define NUM_OF_ETALONS 6

const double* const Etalon_Signals[NUM_OF_ETALONS] PROGMEM = {sinWave1, sinWave2, sinWave3, rectWave4, rectWave5, triWave6};
const double Etalon_Signals_Mean[NUM_OF_ETALONS] PROGMEM = {sinWaveMean1, sinWaveMean2, sinWaveMean3, rectWaveMean4, rectWaveMean5, triWaveMean6};
const double Etalon_Signals_Variance[NUM_OF_ETALONS] PROGMEM = {sinWaveVariance1, sinWaveVariance2, sinWaveVariance3, rectWaveVariance4, rectWaveVariance5, triWaveVariance6};
//-------------- SAMPLED SIGNAL ---------------------
volatile double SignalSamples[NUM_OF_SAMPLES];
volatile int idx = 0;
//-------------- CORRELATION VARS -------------------
// Correlations[NUM_OF_ETALONS] -         array of 6 values which calculate the correlation for particular 'd' (delay) value.
// MaxCorrelationValues[NUM_OF_ETALONS] - array which holds the maximum value of a praticular Etalon signal,
//                                        which I use later to decide which wave is most simmilar to the sampled wave.
//                                        MaxCorrelationValues[k] = the maximum correlation value out of all the 'd' iterations, 
//                                        of Etalon wave number k (look at the formula for better understanding).
// MostSimillarCorrVal -                  Holds the Maximum value of correlation out of 6 Etalons (every Etalon has a maximum correlation point out of d correlations)
//                                        MostSimillarCorrVal will hold the Absulote Maximum out of the 6 Etalons.
// SignalSamplesVar -                     Variance of the Sampled Signal -> Eq: (sum(y[i] - my))
// SignalSamplesMean -                    Mean of the Sampled Signal -> Eq: (my)
// counter -                              used for smart Circular scanning over the ring buffer "SignalSamples".
// maxCorrWave -                          Holds the 2^index of the most simmilar wave out of the etalons.
// SignalSamplesVarTemp -                 a particular substraction of the signal with its mean value -> Eq: (numerator) -> (y(i-d)-my).
volatile double Correlations[NUM_OF_ETALONS] = {0}, MaxCorrelationValues[NUM_OF_ETALONS] = {0}, MostSimillarCorrVal = 0;
volatile double SignalSamplesVar = 0, SignalSamplesMean = 0;
volatile int counter = 0, maxCorrWave = 0;
double EtalonVarianceVal[NUM_OF_ETALONS], SignalSamplesVarTemp;


void setup()
{
  SetTimer1CTCforFreq(SAMPLING_FREQUENCY);
  DDRB |= WAVE_1 | WAVE_2 | WAVE_3 | WAVE_4 | WAVE_5 | WAVE_6;
  pinMode(A0, INPUT);
  for(int i = 0; i<NUM_OF_ETALONS;i++)
    EtalonVarianceVal[i] = pgm_read_float(&Etalon_Signals_Variance[i]);
}

ISR(TIMER1_COMPA_vect)
{
  SignalSamplesVar -= (SignalSamples[idx] - SignalSamplesMean)*(SignalSamples[idx] - SignalSamplesMean);
  SignalSamplesMean = SignalSamplesMean * NUM_OF_SAMPLES - SignalSamples[idx];
  SignalSamples[idx] = analogRead(ANALOG_INPUT_PIN);                        // sampling
  SignalSamplesMean = (SignalSamplesMean + SignalSamples[idx])/NUM_OF_SAMPLES;
  SignalSamplesVar +=  (SignalSamples[idx] - SignalSamplesMean)*(SignalSamples[idx] - SignalSamplesMean);
  // SignalSamplesMean = Mean(SignalSamples, NUM_OF_SAMPLES);                  // Mean of current state calculations.
  // SignalSamplesVar = VarianceSquared(SignalSamples, NUM_OF_SAMPLES);        // Variance of current state calculations.
  for(int delay = -NUM_OF_SAMPLES; delay < NUM_OF_SAMPLES; delay++)         // "shifting" array loop.
  {
    counter = 0;                                                            // counter - an index which runs on SignalSamples array.
    while(counter < NUM_OF_SAMPLES)                                         // loop running on the SignalSamples in circle.
    {
      // SignalSamplesVarTemp = Signal's "Variance" semi-calculation.
      SignalSamplesVarTemp = SignalSamples[(idx + counter - delay)%NUM_OF_SAMPLES] - SignalSamplesMean;
      // Calculating all Cross Correlations with all the six Etalons.
      for(int k = 0; k < NUM_OF_ETALONS; k++)
        Correlations[k] += pgm_read_float(&Etalon_Signals[k][counter]) * SignalSamplesVarTemp; // eMean = 0 always.
      counter++;
    }
    for(int k = 0; k < NUM_OF_ETALONS; k++) // updating all the Etalons.
    {
      Correlations[k] *= Correlations[k];
      Correlations[k] /= SignalSamplesVar * pgm_read_float(&EtalonVarianceVal[k]);        // The Correlation for particular delay.
      // MaxCorrelationValues[k] - Max value of the correlation with Etalon[k]. 
      // At the end we compare which etalon got the biggest correlation value (which is positive).
      MaxCorrelationValues[k] = MaxCorrelationValues[k] < Correlations[k] ? Correlations[k] : MaxCorrelationValues[k]; 
    }
  }
  for(int k = 0; k < NUM_OF_ETALONS; k++)
  {
    if(MaxCorrelationValues[k] > MostSimillarCorrVal)
    {
      MostSimillarCorrVal = Correlations[k];
      maxCorrWave = k;
    }
  }
  PORTB = 1 << maxCorrWave;
  idx = (idx+1)%NUM_OF_SAMPLES;
}

Thanks for those who help

Solution

I will try to be generic - the solution still being applicable to your problem.

NOTE: I look to optimize the equation, not the code. I do not understand how the code relates to the equation.

Let's assume you need to only find the sum of the last N samples of the signal.

Let the samples be

x0, x1, ..., x(N-1)

Let the first sum be

S0 = x0 + x1 + ... + x(N-1)

Then the new sample arrives:

xN

The sum being

S1 = x1 + x2 + ... + x(N-1) + xN

But this sum can be rewritten as:

S1 = S0 - x0 + xN

Thus, you only need to remember 3 numbers:

previous / current sum (S0);
oldest sample value (x0);
newest sample value (xN).

Therefore, instead of calculating (N-2) additions, you calculate only 2 (actually only an addition and a subtraction).

If you need to calculate the sum of

x1 - xm, x2 - xm ...

you need to notice that when calculation S1 you will have

S1 = S0 - (x0-xm) + (xN-xm)

which translates to

S1 = S0 - x0 + xm

which saves you adding and subtracting xm unnecessarily.

With the proper modifications, you can have the entire equation calculated in just a few steps, every time a sample arrives.

Except the sqrt() - I have no idea how to help there.

Please note that there will be no heavy work even at the beginning. You start with all samples at value zero. As the real samples come, "the first" S0 is built - which will be used subsequently.

Actually, "the first" S0 will be

S0 = x0

Another optimization would be to calculate

y(i-d) - my

once, and use it twice. It is not much, but still it will help. Also, if you are sure that mx will not change, take it out of the formulas completely. Extracting zero is just time consuming for no benefit.

Another trick, if the application allows, would be decrease the sampling rate. In that way, the samples (and therefore the interrupts) come slower, and the processor has more time to do the job.