I am brand new to Digital Signal Processing and I am trying to find the peak of an audio file spectrum, I usually open an audio file with Audacity and plot the spectrum.
I could find the peak at 120HZ by visualizing the spectrum above, but it requires some manual work.
I would like to find the peak in a more programatically way with Python. I am not sure which spectrum is plotted in Audacity but I am supposing it is the spectogram. I tried to find such a peak programatically as below:
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
import numpy as np
sample_rate, samples = wavfile.read('audio1.wav')
frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate)
#get maximum
x,y=np.where(spectrogram == spectrogram.max())
print("Frequency index where the maximum is")
print(x)
print("Frequency Value")
print(frequencies[x])
However, by running the code above I find the frequency of the maximum as 74.21875, which is very far away from the 120HZ I found in Audacity.
So, what I am mistaking here? is there any way to do such a task with Python? or is the spectogram the wrong place to look at the maximum?
P.s: you can find my audio file here
If you want to analyze frequencies, numpy.fft.fft
is the preferred way to transform an entire audio signal from the time domain into the frequency domain. Much like wavfile.read
reads your audio as an array of amplitudes for each time step (i.e. sample), np.fft.fft
transforms this array into an array of amplitudes for each frequency step.
On this fft-transformed array, find the index of the maximum amplitude via np.argmax
, multiply by the sample rate and divide by the length of your signal (in samples, not seconds) to get its frequency in Hertz.
Or in code:
from scipy import signal
from scipy.io import wavfile
import numpy as np
sample_rate, samples = wavfile.read('audio1.wav')
fft_samples = np.abs(np.fft.fft(samples))
peak_index=np.argmax(fft_samples) # get indices of the largest amplitude
max_frequency = peak_index / (len(samples)) * sample_rate
print(
f"""
Frequency index where the maximum is:
{peak_index}
Maximum Frequency:
{str(max_frequency)} Hz
Frequency Value:
{fft_samples[peak_index]}
Frequency Value once again (should be the same if our calculations were right):
{np.max(fft_samples)}
"""
)
Output:
Frequency index where the maximum is:
71790
Maximum Frequency:
119.96330416270493 Hz
Frequency Value:
11726922.812383095
Frequency value once again to check that our calculations were right:
11726922.812383095
My approach should be a bit faster than mad's since I'm not creating an array the size of the input signal through np.fft.fftfrequs
just get one value.