In the following 4D
array, each column represents an attribute for machine learning model development.
import numpy as np
from scipy.fft import fft, fftfreq, fftshift
A = np.array([
[[[0, 1, 2, 3],
[3, 0, 1, 2],
[2, 3, 0, 1],
[1, 3, 2, 1],
[1, 2, 3, 0]]],
[[[9, 8, 7, 6],
[5, 4, 3, 2],
[0, 9, 8, 3],
[1, 9, 2, 3],
[1, 0, -1, 2]]],
[[[0, 7, 1, 2],
[1, 2, 1, 0],
[0, 2, 0, 7],
[-1, 3, 0, 1],
[1, 0, 1, 0]]]
])
A.shape
(3, 1, 5, 4)
In this example, A
has 3-instances, each instance represented by 4-features (1 for each column of the inner arrays (shaped (1,5,4)
). Features are represented in time-domain
.
I needed a handy way to calculate the frequency-domain
(Fourier transform )for each feature of each instance.
Details
Considering the given example, I do it the following way.
Get the array for all features (1 for each feature like this:
feat1 = A[..., [0]] # the first feature
feat2 = A[..., [1]] # the second feature
feat3 = A[..., [2]] # 3rd feature etc...
feat4 = A[..., [3]] # 4th feature etc...
So that the first feature in the example data is:
feat1
array([[[[ 0],
[ 3],
[ 2],
[ 1],
[ 1]]],
[[[ 9],
[ 5],
[ 0],
[ 1],
[ 1]]],
[[[ 0],
[ 1],
[ 0],
[-1],
[ 1]]]])
Retrieve each instance´s features set, like so:
# 1st instance
inst1_1 = feat1[0].ravel() # instance1 1st feature
inst1_2 = feat2[0].ravel() # instance1 2nd feature
inst1_3 = feat3[0].ravel() # instance1 3rd feature
inst1_4 = feat4[0].ravel() # instance1 4th feature
# 2nd instance
inst2_1 = feat1[1].ravel() # instance2 1st feature
inst2_2 = feat2[1].ravel() # instance2 2nd feature
inst2_3 = feat3[1].ravel() # instance2 3rd feature
inst2_4 = feat4[1].ravel() # instance2 4th feature
# 3rd instance
inst3_1 = feat1[2].ravel() # instance3 1st feature
inst3_2 = feat2[2].ravel() # instance3 2nd feature
inst3_3 = feat3[2].ravel() # instance3 3rd feature
inst3_4 = feat4[2].ravel() # instance3 4th feature
Then calculate the Fourier transform (for each feature of each instance).
inst1_1_fft = fft(inst1_1)
inst1_2_fft = fft(inst1_2)
inst1_3_fft = fft(inst1_3)
inst1_4_fft = fft(inst1_4)
inst2_1_fft = fft(inst2_1)
inst2_2_fft = fft(inst2_2)
inst2_3_fft = fft(inst2_3)
inst2_4_fft = fft(inst2_4)
inst3_1_fft = fft(inst3_1)
inst3_2_fft = fft(inst3_2)
inst3_3_fft = fft(inst3_3)
inst3_4_fft = fft(inst3_4)
This is absolutely dummies approach
. Besides, I cannot do this on my real dataset where I have over 60K instances.
Questions:
How do I do this at once for the whole array A
?
The frequencies seem to be the same, for feature of each instance. Am I doing this correctly? He's how I do:
sampling_rate = A.shape[2] # each instance is fixed-sized A.shape[2] (5 in this e.g)
N = inst1_1_fft.size
inst1_1_fft_freq = fftfreq(N, d = 1 / sampling_rate)
inst1_1_fft_freq
array([ 0., 1., 2., -2., -1.])
In order to perform a fast Fourier transform along all your instances' features at once, you may do it along axis=2
. As an alternative, you may first reshape A
, by using np.reshape
and np.swapaxes
, prior to the fft
computation, one possible solution is:
A.shape
>>> (3, 1, 5, 4)
N = 5 # number of time samples for the feature axis
inst_reshape = A.swapaxes(1,3).reshape((-1,N))
inst_reshape.shape
>>> (12, 5) # (number of instances x number of features, time sampling)
which follows the same instance / feature ordering as in your question. You may as well check this by comparing the resulting inst_reshape
array with its manually assembled counterpart array:
inst_vstack = np.vstack((inst1_1,inst1_2,inst1_3,inst1_4,\
inst2_1,inst2_2,inst2_3,inst2_4,\
inst3_1,inst3_2,inst3_3,inst3_4))
np.allclose(inst_vstack,inst_reshape)
>>> True
Having done so, you may now call np.fft.fft
once along the column axis axis=1
of the newly reshaped array:
sampling_rate = A.shape[2]
sample_spacing = 1/sampling_rate
freq = np.linspace(0.0, 1.0/(2.0*sample_spacing), N//2)
X = np.fft.fft(inst_reshape, axis=1)
amp = 2/N*np.abs(X[:N//2])
from matplotlib import pyplot as plt
plt.plot(freq, amp)
I would definitely encourage you to take a look at the following question where the general idea behind these kind of array transformations is nicely explained.