I'm analyzing what is essentially a respiratory waveform, constructed in 3 different shapes (the data originates from MRI, so multiple echo times were used; see here if you'd like some quick background). Here are a couple of segments of two of the plotted waveforms for some context:
For each waveform, I'm trying to perform a DFT in order to discover the dominant frequency or frequencies of respiration.
My issue is that when I plot the DFTs that I perform, I get one of two things, dependent on the FFT library that I use. Furthermore, neither of them is representative of what I am expecting. I understand that data doesn't always look the way we want, but I clearly have waveforms present in my data, so I would expect a discrete Fourier transform to produce a frequency peak somewhere reasonable. For reference here, I would expect about 80 to 130 Hz.
My data is stored in a pandas
data frame, with each echo time's data in a separate series. I'm applying the FFT function of choice (see the code below) and receiving different results for each of them.
SciPy (fftpack
)
import pandas as pd
import scipy.fftpack
# temporary copy to maintain data structure
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame
lead_pts_fft_df.magnitude = lead_pts_df.magnitude.apply(scipy.fftpack.fft)
lead_pts_fft_df
NumPy:
import pandas as pd
import numpy as np
# temporary copy to maintain data structure
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame
lead_pts_fft_df.magnitude = lead_pts_df.magnitude.apply(np.fft.fft)
lead_pts_fft_df
The rest of the relevant code:
ECHO_TIMES = [0.080, 0.200, 0.400] # milliseconds
f_s = 1 / (0.006) # 0.006 = time between samples
freq = np.linspace(0, 29556, 29556) * (f_s / 29556) # (29556 = length of data)
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(freq, lead_pts_fft_df.magnitude[idx].real, # real part
freq, lead_pts_fft_df.magnitude[idx].imag, # imaginary part
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()
Post-DFT (SciPy fftpack
):
Post-DFT (NumPy)
Here is a link to the dataset (on Dropbox) used to create these plots, and here is a link to the Jupyter Notebook.
EDIT:
I used the posted advice and took the magnitude (absolute value) of the data, and also plotted with a logarithmic y-axis. The new results are posted below. It appears that I have some wraparound in my signal. Am I not using the correct frequency scale? The updated code and plots are below.
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(freq[1::], np.log(np.abs(lead_pts_fft_df.magnitude[idx][1::])),
label=lead_pts_df.index[idx], # apply labels
color=plot_colors[idx]) # colors
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()
I've figured out my issues.
Cris Luengo was very helpful with this link, which helped me determine how to correctly scale my frequency bins and plot the DFT appropriately.
ADDITIONALLY: I had forgotten to take into account the offset present in the signal. Not only does it cause issues with the huge peak at 0 Hz in the DFT, but it is also responsible for most of the noise in the transformed signal. I made use of scipy.signal.detrend()
to eliminate this and got a very appropriate looking DFT.
# import DFT and signal packages from SciPy
import scipy.fftpack
import scipy.signal
# temporary copy to maintain data structure; original data frame is NOT changed due to ".copy()"
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame AND detrend the signal
lead_pts_fft_df.magnitude = lead_pts_fft_df.magnitude.apply(scipy.signal.detrend)
lead_pts_fft_df.magnitude = lead_pts_fft_df.magnitude.apply(np.fft.fft)
lead_pts_fft_df
Arrange frequency bins accordingly:
num_projections = 29556
N = num_projections
T = (6 * 3) / 1000 # 6*3 b/c of the nature of my signal: 1 pt for each waveform collected every third acquisition
xf = np.linspace(0.0, 1.0 / (2.0 * T), num_projections / 2)
Then plot:
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(xf, 2.0/num_projections * np.abs(lead_pts_fft_df.magnitude[idx][:num_projections//2]),
label=lead_pts_df.index[idx], # apply labels
color=plot_colors[idx]) # colors
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()