python plot signal-processing fft frequency

Plot Fourier in Frequency domain of Voice in Python

Iam facing a very strange problem with my plots. My code records my voice from the microphone and then makes some plots. A plot of voice in time domain, a plot in frequency domain and a spectrogramm. The problem is that my plot in frequency domain does not seems to be true. For example have a look at my plots.

So in this record iam saying 'one, two, three, four' or something like that. The time domain plot does make sense. The spectrogram also in my eyes does make sense because the loudest Fourier magnitudes are at normal human voice frequencies ~100 Hz.

The problem is

My short time fourier transform in frequency domain plot, seems to plot very high frequencies with very high magnitude, and the human voice frequencies 1-1000 have zero value.

So what maybe is going wrong? Below i give my code

import matplotlib.pyplot as plt
import numpy as np
import scipy.fft
import sounddevice as sd
from scipy import signal, fft

Fs = 8000  # Sampling frequency
duration = 5  # Recording duration in seconds
voice = sd.rec(frames=duration * Fs, samplerate=Fs, channels=1, dtype='int16')  # Capture the voice
# frames indicate  indirectly the duration of record, dtype is 16 bits per sample.
sd.wait()  # close after recording finish
time = np.linspace(0, len(voice - 1) / Fs, len(voice - 1))  # split x axis in voice-1 points
# points have 1/Fs distance each other
plt.plot(voice / len(voice))
plt.ylabel('Voice amplitude')
plt.xlabel('No of sample')
plt.title("Voice Signal with respect to sample number")
plt.show()
plt.plot(time, voice / len(voice))  # plot in seconds
plt.title("Voice Signal")
plt.xlabel("Time [seconds]")
plt.ylabel("Voice amplitude")
plt.show()
plt.plot((10**3)*time, voice / len(voice))  # plot in milliseconds
plt.title("Voice Signal")
plt.xlabel("Time [milliseconds]")
plt.ylabel("Voice amplitude")
plt.show()
N = len(voice)
# Fourier transform
F = scipy.fft.fft(voice) / N
#f = np.linspace(0, Fs - Fs / N, N)
f = fft.fftfreq(n=N, d=1 / Fs)[:N // 2]
#f = np.linspace(0, 4000, N//2)
plt.plot(f, abs(F[0:N // 2]))
plt.title("FFT of the signal")
plt.xlabel('Frequency')
plt.ylabel('Power of Frequency')
plt.show()
Voice = voice.flatten()  # formatting Voice 2-D array to numpy 1-D array
print(Voice)
freq, t, stft = signal.spectrogram(Voice, Fs, mode='complex')
#Sxx, freq, t = plt.specgram(Voice, Fs=Fs, mode='magnitude')
print(stft)
print(freq)
print(t)
plt.pcolormesh(t, freq, abs(stft), shading='gouraud')
plt.title('Spectrogramm using STFT amplitude')
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [seconds]')
plt.show()

Solution

With the 2D array voice (most likely Nx1, for mono recording), scipy.fft.fft ends up computing a batch of N 1D FFTs of length 1. Since the FFT of a sequence of 1 value is an identity, what you see in your 2nd plot is the absolute value of the first half of your time domain signal.

Try computing the FFT on a 1D array (a single channel), with e.g. :

F = scipy.fft.fft(voice[:,0]) / N