Search code examples
c#vectorfftmathnet-numericsautocorrelation

How can I apply FFT in the case of vector data?


I wrote the following class to compute autocorrelation of a List of 3d vectors.

I took the formula from this link

public static class AutocorrVec3
{
    private static double C(int t, List<Vec3> vectors)
    {
        int n = vectors.Count;
        if (t >= n || t < 0)
            throw new ArgumentException("Invalid value for t. It must be between 0 and n-1.");

        double sum = 0;
        for (int i = 0; i < n - t; i++)
        {
            sum += vectors[i].Dot(vectors[i + t]);
        }

        return sum / (n - t);
    }

    public static (List<double> taus, List<double> autocorrs) 
        GetAutoCorrelationPoints(List<Vec3> vectors, int maxLag)
    {
        var tValues = new List<double>();
        var cResults = new List<double>();

        double c0 = C(0, vectors); // This is the normalization factor
        Console.WriteLine($"Normalization factor: {c0}");

        for (int lag = 0; lag <= maxLag; lag++)  // Start from 0 to include the autocorrelation at lag 0
        {
            try
            {
                double cValue = C(lag, vectors);

                Console.WriteLine($"Lag={lag}, Raw Autocorr={cValue}, Normalized Autocorr={cValue / c0}");
                tValues.Add(lag);
                cResults.Add(cValue / c0); // Normalization is done here
            }
            catch (ArgumentException ex)
            {
                Console.WriteLine(ex.Message);
                break;
            }
        }
        return (tValues, cResults);
    }
}

The problem is, GetAutoCorrelationPoints() is terribly slow. For example, I need to convert 24 files each containing 10000000 3d vectors. After 24 hours, it couldn't finish computing even one data file.

How can I apply FFT in this case?

I want to use MathNet.Numerics.

using System;
using System.Collections.Generic;
using System.Numerics;
using MathNet.Numerics.IntegralTransforms;


namespace FourierTransformCSharp
{
    public static class AutocorrVec3
    {
        // Compute the autocorrelation of a time series using FFT
        public static double[] ComputeAutocorrelationUsingFFT(List<Vec3> vectors)
        {
            int n = vectors.Count;
            // Create a zero-padded list double the size of the original for FFT
            var paddedVectors = new Complex[2 * n];
            for (int i = 0; i < n; i++)
            {
                // Convert vector to complex number with magnitude as real part
                paddedVectors[i] = new Complex(vectors[i].Magnitude(), 0);
            }
            for (int i = n; i < 2 * n; i++) // Zero padding
            {
                paddedVectors[i] = Complex.Zero;
            }

            // Perform FFT on the zero-padded list
            Fourier.Forward(paddedVectors, FourierOptions.Default);

            // Compute power spectrum (magnitude squared)
            for (int i = 0; i < paddedVectors.Length; i++)
            {
                var magnitude = paddedVectors[i].Magnitude;
                paddedVectors[i] = new Complex(magnitude * magnitude, 0);
            }

            // Perform Inverse FFT to obtain the autocorrelation function
            Fourier.Inverse(paddedVectors, FourierOptions.Default);

            // Extract the real parts as the autocorrelation result
            var autocorr = new double[n];
            for (int i = 0; i < n; i++)
            {
                autocorr[i] = paddedVectors[i].Real;
            }

            // Normalize the autocorrelation result
            var normalizationFactor = autocorr[0];
            for (int i = 0; i < n; i++)
            {
                autocorr[i] /= normalizationFactor;
            }

            return autocorr;
        }

        // Calculate autocorrelation of vector time series at lag t
        public static double C(int t, List<Vec3> vectors)
        {
            double[] autocorr = ComputeAutocorrelationUsingFFT(vectors);
            return autocorr[t];
        }

        // Get autocorrelation values for lags from 0 to maxLag
        public static (List<int> taus, List<double> autocorrs) GetAutoCorrelationPoints(List<Vec3> vectors, int maxLag)
        {
            List<int> taus = new List<int>();
            List<double> autocorrs = new List<double>();
            double[] autocorr = ComputeAutocorrelationUsingFFT(vectors);

            for (int t = 0; t <= maxLag && t < vectors.Count; t++)
            {
                taus.Add(t);
                autocorrs.Add(autocorr[t]);
            }

            return (taus, autocorrs);
        }

        // Parallel computation is unnecessary as FFT-based autocorrelation is already fast
        // Use GetAutoCorrelationPoints for parallel computation
    }

    public class Vec3
    {
        public double X { get; }
        public double Y { get; }
        public double Z { get; }

        public Vec3(double x, double y, double z)
        {
            X = x;
            Y = y;
            Z = z;
        }

        // Compute the magnitude of the vector
        public double Magnitude()
        {
            return Math.Sqrt(X * X + Y * Y + Z * Z);
        }
    }

    public class AutocorrelationDriver
    {
        public static void Main()
        {
            // Define your list of 3D vectors
            List<Vec3> vectors = new List<Vec3>
            {
                new Vec3(1, 1, 1),
                new Vec3(2, 2, 2),
                new Vec3(3, 3, 3),
                new Vec3(4, 4, 4),
                new Vec3(5, 5, 5)
            };

            // Compute the autocorrelation
            var aurocorr = AutocorrVec3.GetAutoCorrelationPoints(vectors, vectors.Count - 1);

            // Output the results
            Console.WriteLine("Lag\tAutocorrelation");
            for (int i = 0; i < aurocorr.taus.Count; i++)
            {
                Console.WriteLine($"{aurocorr.taus[i]}\t{aurocorr.autocorrs[i]}");
            }

            Console.ReadKey();
        }
    }
}

I have written the above code.

The correct output is as follows:

enter image description here

However, my written code gives the following output:

Lag     Autocorrelation
0       1
1       0.727272727272727
2       0.472727272727273
3       0.254545454545455
4       0.0909090909090911

How can I fix my code?


Solution

  • There is actually nothing wrong with your FFT based calculation as far as it goes. You have remembered to zero pad in all the right places (well done) and it gives the answer that I would expect for your test data.

    The difference between the two codes is explained by the fact that you have forgotten to normalise the output of the FFT computation by the number of contributing lagged components 1/(n-t).

    Applying that correction your FFT computation of correlation gives exactly the same answers as the longhand method on your toy test data.

    I'm not quite sure about the merit of applying a correlation function to the magnitude of a vector quantity but that is another matter entirely. Here is a table for the sample data.

    Data Lag t sum(a(i).a(i+t)) Norm C(0,v5) /(n-t) Norm C(t,v5)
    1 0 55 1 0.2 1
    2 1 40 0.727272727 0.181818182 0.909090909
    3 2 26 0.472727273 0.157575758 0.787878788
    4 3 14 0.254545455 0.127272727 0.636363636
    5 4 5 0.090909091 0.090909091 0.454545455

    The /(n-t) column is the result of dividing the previous column of raw correlation values by 5,4,3,2,1 respectively to correct for the number of components in the sum and the final column is the result renormalising that again to make zero lag t=0 equal to 1.0.

    • The last column is the result of normalising the correlation function so that the lag zero result is 1.0. So in this simple example multiply by 5 or easier for mental arithmetic divide by 0.2.

    Original slow code computing and summing the correlation terms directly. You apply a correction for the number of terms that you have summed over before returning it.

     public static class AutocorrVec3
     {
         private static double C(int t, List<Vec3> vectors)
        {
           int n = vectors.Count;
           if (t >= n || t < 0)
                throw new ArgumentException("Invalid value for t. It must be between 0 and n-1.");
    
           double sum = 0;
           for (int i = 0; i < n - t; i++)
           {
               sum += vectors[i].Dot(vectors[i + t]);
           }
    
        return sum / (n - t);  // ** this factor is missing in the FFT code!
     }
     [snip]
    

    Whereas in the FFT based code you just return the unmodified value that is called sum in the first sample code without correcting by the factor 1/(n-t) with the lag value of t.

        // Calculate autocorrelation of vector time series at lag t
        public static double C(int t, List<Vec3> vectors)
        {
            double[] autocorr = ComputeAutocorrelationUsingFFT(vectors);
            return autocorr[t]/(n-t); // change this line
        }
    
        // Get autocorrelation values for lags from 0 to maxLag
        public static (List<int> taus, List<double> autocorrs)     GetAutoCorrelationPoints(List<Vec3> vectors, int maxLag)
        {
            List<int> taus = new List<int>();
            List<double> autocorrs = new List<double>();
            double[] autocorr = ComputeAutocorrelationUsingFFT(vectors);
    
            for (int t = 0; t <= maxLag && t < vectors.Count; t++)
            {
                taus.Add(t);
                autocorrs.Add(autocorr[t]/(n-t));  // change this line
            }
    
            return (taus, autocorrs);
        }
         [snip]
    

    I hope this is now clear.

    You could further optimise it by having a method to return the Power in the vector rather than return magnitude = sqrt(X.X+Y.Y+Z.Z) and then squaring it if you return "X.X+Y.Y+Z.Z" it is oven ready for computing the power spectrum. I'm really not sure what physical interpretation this computation has though. You are computing the correlation function of the scalar length of the vector as a function of time lag.

    BTW You could almost double the speed and halve the memory requirements by using a real to complex conjugate symmetric version of the FFT. That avoids having to promote your real data to complex with zero imaginary components. Try it "as is" first though since Nlog N is a lot smaller than N^2 for largish N.

    But a factor of two speed improvement might still be worth the extra effort.