Search code examples
pythonarraysnumpymultidimensional-arrayfft

Computing the Fourier-transform of each column for each array in a multidimensional array


In the following 4D array, each column represents an attribute for machine learning model development.

import numpy as np
from scipy.fft import fft, fftfreq, fftshift

A = np.array([
            [[[0, 1, 2, 3],
              [3, 0, 1, 2],
              [2, 3, 0, 1],
              [1, 3, 2, 1],
              [1, 2, 3, 0]]],
            
            [[[9, 8, 7, 6],
              [5, 4, 3, 2],
              [0, 9, 8, 3],
              [1, 9, 2, 3],
              [1, 0, -1, 2]]],
            
            [[[0, 7, 1, 2],
              [1, 2, 1, 0],
              [0, 2, 0, 7],
              [-1, 3, 0, 1],
              [1, 0, 1, 0]]]
              ])

A.shape
(3, 1, 5, 4)

In this example, A has 3-instances, each instance represented by 4-features (1 for each column of the inner arrays (shaped (1,5,4)). Features are represented in time-domain.

I needed a handy way to calculate the frequency-domain (Fourier transform )for each feature of each instance.

Details

Considering the given example, I do it the following way.

  1. Get the array for all features (1 for each feature like this:

    feat1 = A[..., [0]] # the first feature
    feat2 = A[..., [1]] # the second feature
    feat3 = A[..., [2]] # 3rd feature etc...
    feat4 = A[..., [3]] # 4th feature etc...
    

    So that the first feature in the example data is:

    feat1
    array([[[[ 0],
         [ 3],
         [ 2],
         [ 1],
         [ 1]]],
    
    
       [[[ 9],
         [ 5],
         [ 0],
         [ 1],
         [ 1]]],
    
    
       [[[ 0],
         [ 1],
         [ 0],
         [-1],
         [ 1]]]])
    
  2. Retrieve each instance´s features set, like so:

    # 1st instance
    inst1_1 = feat1[0].ravel()  # instance1 1st feature
    inst1_2 = feat2[0].ravel()  # instance1 2nd feature
    inst1_3 = feat3[0].ravel()  # instance1 3rd feature
    inst1_4 = feat4[0].ravel()  # instance1 4th feature
    
    # 2nd instance 
    inst2_1 = feat1[1].ravel()  # instance2 1st feature
    inst2_2 = feat2[1].ravel()  # instance2 2nd feature
    inst2_3 = feat3[1].ravel()  # instance2 3rd feature
    inst2_4 = feat4[1].ravel()  # instance2 4th feature
    
    # 3rd instance 
    inst3_1 = feat1[2].ravel()  # instance3 1st feature
    inst3_2 = feat2[2].ravel()  # instance3 2nd feature
    inst3_3 = feat3[2].ravel()  # instance3 3rd feature
    inst3_4 = feat4[2].ravel()  # instance3 4th feature
    
  3. Then calculate the Fourier transform (for each feature of each instance).

    inst1_1_fft = fft(inst1_1)
    inst1_2_fft = fft(inst1_2)
    inst1_3_fft = fft(inst1_3)
    inst1_4_fft = fft(inst1_4)
    
    inst2_1_fft = fft(inst2_1)
    inst2_2_fft = fft(inst2_2)
    inst2_3_fft = fft(inst2_3)
    inst2_4_fft = fft(inst2_4)
    
    inst3_1_fft = fft(inst3_1)
    inst3_2_fft = fft(inst3_2)
    inst3_3_fft = fft(inst3_3)
    inst3_4_fft = fft(inst3_4)
    

This is absolutely dummies approach. Besides, I cannot do this on my real dataset where I have over 60K instances.

Questions:

  1. How do I do this at once for the whole array A?

  2. The frequencies seem to be the same, for feature of each instance. Am I doing this correctly? He's how I do:

sampling_rate = A.shape[2] # each instance is fixed-sized A.shape[2] (5 in this e.g)
N = inst1_1_fft.size

inst1_1_fft_freq = fftfreq(N, d = 1 / sampling_rate)

inst1_1_fft_freq
array([ 0.,  1.,  2., -2., -1.])

Solution

  • In order to perform a fast Fourier transform along all your instances' features at once, you may do it along axis=2. As an alternative, you may first reshape A, by using np.reshape and np.swapaxes, prior to the fft computation, one possible solution is:

    A.shape
    >>> (3, 1, 5, 4)
    
    N = 5 # number of time samples for the feature axis
    inst_reshape = A.swapaxes(1,3).reshape((-1,N))
    inst_reshape.shape
    >>> (12, 5) # (number of instances x number of features, time sampling)
    

    which follows the same instance / feature ordering as in your question. You may as well check this by comparing the resulting inst_reshape array with its manually assembled counterpart array:

    inst_vstack = np.vstack((inst1_1,inst1_2,inst1_3,inst1_4,\
                             inst2_1,inst2_2,inst2_3,inst2_4,\
                             inst3_1,inst3_2,inst3_3,inst3_4))
    np.allclose(inst_vstack,inst_reshape)
    >>> True
    

    Having done so, you may now call np.fft.fft once along the column axis axis=1 of the newly reshaped array:

    sampling_rate = A.shape[2]
    sample_spacing = 1/sampling_rate
    
    freq = np.linspace(0.0, 1.0/(2.0*sample_spacing), N//2)
    X = np.fft.fft(inst_reshape, axis=1)
    amp = 2/N*np.abs(X[:N//2])
    
    from matplotlib import pyplot as plt
    plt.plot(freq, amp)
    

    I would definitely encourage you to take a look at the following question where the general idea behind these kind of array transformations is nicely explained.