Search code examples
pythonscipyfftpeak-detection

Why does duration of WAV file affect the values returned by Scipy find_peaks?


I'm writing some very simple code to do frequency analysis on a WAV file (as recorded by a microphone into Audacity): Do an FFT on the file, and find the resulting peak frequency values. As far as I can tell, I've followed the library documentation correctly.

The WAV file is a frequency sweep from 1 to 5000Hz, back to 1Hz.

When applying Scipy find_peaks to the FFT data, the values it returns are multiplied by the length of the file (in seconds). See the following plots:

Plots of: 1: Time-based data. 2. FFT single-sided. 3. FFT with peaks (note different frequency range). 4. FFT with corrected peaks (peaks/secs).

The 4th plot shows that if I divide the peaks data by the file duration, the values are correct. I don't like this solution, as I can find no documentation to back it up. Does anyone know what I'm doing wrong here?

My code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import scipy.io.wavfile as wavfile
import scipy
import scipy.fftpack
import scipy.fft
import numpy as np
from scipy.signal import find_peaks as fp
from matplotlib import pyplot as plt

#read the wav file and print the sample rate
fs_rate, signal = wavfile.read("1Hz-5k_sweep_phone_single.wav")
print ("Sampling rate:", fs_rate)

#calculate number of channels of signal
l_audio = len(signal.shape)
print ("Channels:", l_audio)

#trim one channel, if channels = 2
if l_audio == 2:
    signal = signal.sum(axis=1) / 2
    
#print number of samples
N = signal.shape[0]
print ("Number of samples N:", N)

#calculate duration of file
secs = N / float(fs_rate)
print ("File duration (secs):", secs)

#calculate sample interval
Ts = 1.0/fs_rate # sampling interval in time
print ("Timestep between samples Ts:", Ts)


t = np.arange(0, secs, Ts) # time vector as scipy arange field / numpy.ndarray

#Do FFT
FFT = abs(scipy.fft.fft(signal))
#print("FFT size: ",FFT.shape[0])

#split the FFT to one-sided range
FFT_side = FFT[range(N//2)] # one side FFT range
print("FFT upper side: ", FFT_side)

#calculate the frequency bins.  Put into array
freqs = scipy.fft.fftfreq(signal.size, t[1]-t[0])
print("t:", t[1]-t[0])
fft_freqs = np.array(freqs)
#print("Frequency bins: ",fft_freqs)

#Split the frequencies to one-sided range
freqs_side = freqs[range(N//2)] # one side frequency range
fft_freqs_side = np.array(freqs_side)
print("size of upper-side fft: ",fft_freqs_side.shape[0])


#do peak detection on data.
h = (2.0e+04,2.0e+08)  #detected height range
p = None  #prominence
d = 100 #minimum distance between peaks (ignore closer peaks)
peaks, _ = fp(x=FFT_side, height=h,prominence=p,distance=d)
print("Peak detected frequencies (Hz):",peaks)
print("number of peaks: ", peaks.shape[0])



#data plotting
fig, axes = plt.subplots(nrows = 2, ncols = 1, figsize= (12,9))

ax1 = plt.subplot(411)
ax1.margins(0.05)           # Default margin is 0.05, value 0 means fit
ax1.plot(t, signal, "g",linewidth=0.1)
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')

ax2 = plt.subplot(412)
ax2.plot(freqs_side, abs(FFT_side), "b")
ax2.set_xlabel('Frequency (Hz)')
ax2.set_ylabel('FFT single-sided')
ax2.set_xlim(0,5500)
ax2.set_ylim(0,200000)

ax3 = plt.subplot(413)
ax3.plot(freqs_side, abs(FFT_side), "b")
ax3.plot(peaks, FFT_side[peaks],"x",color='tab:orange')
ax3.set_xlabel('Frequency (Hz)')
ax3.set_ylabel('FFT single-sided with peaks')
ax3.set_xlim(0,55000)
ax3.set_ylim(0,200000)

ax4 = plt.subplot(414)
ax4.plot(freqs_side, abs(FFT_side), "b")
ax4.plot(peaks/secs, FFT_side[peaks],"x",color='tab:orange')
ax4.set_xlabel('Frequency (Hz)')
ax4.set_ylabel('FFT single-sided: corrected peaks')
ax4.set_xlim(0,5500)
ax4.set_ylim(0,200000)

fig.tight_layout() # Or equivalently,  "plt.tight_layout()"

plt.show()

Multiple entries on stackoverflow suggested that find_peaks was a preferable way to approach finding FFT peaks. I've looked at the documentation for find_peaks, and verified that my code follows the examples there and also in various other examples that claim to work.

I also searched stackoverflow and google for related issues, but didn't find anything.

As documented above, I've 'solved' it by dividing the peak values by the duration of the WAV file - but I'm convinced that I'm doing something wrong elsewhere in my code.


Solution

  • That is because peaks are not the frequencies, peaks are the indices. You have to use:

    ax3 = plt.subplot(413)
    ax3.plot(freqs_side, abs(FFT_side), "b")
    ax3.plot(freqs_side[peaks], FFT_side[peaks],"x",color='tab:orange')
    ax3.set_xlabel('Frequency (Hz)')
    ax3.set_ylabel('FFT single-sided with peaks')
    ax3.set_xlim(0,55000)
    ax3.set_ylim(0,200000)
    

    When you are dividing them by seconds, you are actually converting them back to Hz :D