Iam facing a very strange problem with my plots. My code records my voice from the microphone and then makes some plots. A plot of voice in time domain, a plot in frequency domain and a spectrogramm. The problem is that my plot in frequency domain does not seems to be true. For example have a look at my plots.
So in this record iam saying 'one, two, three, four' or something like that. The time domain plot does make sense. The spectrogram also in my eyes does make sense because the loudest Fourier magnitudes are at normal human voice frequencies ~100 Hz.
So what maybe is going wrong? Below i give my code
import matplotlib.pyplot as plt
import numpy as np
import scipy.fft
import sounddevice as sd
from scipy import signal, fft
Fs = 8000 # Sampling frequency
duration = 5 # Recording duration in seconds
voice = sd.rec(frames=duration * Fs, samplerate=Fs, channels=1, dtype='int16') # Capture the voice
# frames indicate indirectly the duration of record, dtype is 16 bits per sample.
sd.wait() # close after recording finish
time = np.linspace(0, len(voice - 1) / Fs, len(voice - 1)) # split x axis in voice-1 points
# points have 1/Fs distance each other
plt.plot(voice / len(voice))
plt.ylabel('Voice amplitude')
plt.xlabel('No of sample')
plt.title("Voice Signal with respect to sample number")
plt.show()
plt.plot(time, voice / len(voice)) # plot in seconds
plt.title("Voice Signal")
plt.xlabel("Time [seconds]")
plt.ylabel("Voice amplitude")
plt.show()
plt.plot((10**3)*time, voice / len(voice)) # plot in milliseconds
plt.title("Voice Signal")
plt.xlabel("Time [milliseconds]")
plt.ylabel("Voice amplitude")
plt.show()
N = len(voice)
# Fourier transform
F = scipy.fft.fft(voice) / N
#f = np.linspace(0, Fs - Fs / N, N)
f = fft.fftfreq(n=N, d=1 / Fs)[:N // 2]
#f = np.linspace(0, 4000, N//2)
plt.plot(f, abs(F[0:N // 2]))
plt.title("FFT of the signal")
plt.xlabel('Frequency')
plt.ylabel('Power of Frequency')
plt.show()
Voice = voice.flatten() # formatting Voice 2-D array to numpy 1-D array
print(Voice)
freq, t, stft = signal.spectrogram(Voice, Fs, mode='complex')
#Sxx, freq, t = plt.specgram(Voice, Fs=Fs, mode='magnitude')
print(stft)
print(freq)
print(t)
plt.pcolormesh(t, freq, abs(stft), shading='gouraud')
plt.title('Spectrogramm using STFT amplitude')
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [seconds]')
plt.show()
With the 2D array voice
(most likely Nx1, for mono recording), scipy.fft.fft
ends up computing a batch of N 1D FFTs of length 1. Since the FFT of a sequence of 1 value is an identity, what you see in your 2nd plot is the absolute value of the first half of your time domain signal.
Try computing the FFT on a 1D array (a single channel), with e.g. :
F = scipy.fft.fft(voice[:,0]) / N