I am converting a Python code to MATLAB. The Python code, uses the following command:
stft_ch = librosa.core.stft(audio_input[:, ch_cnt], n_fft=self._nfft,
hop_length=self._hop_len, win_length=self._win_len,
window='hann')
Where audio_input.shape=(2880000, 4)
, self._nfft=2048
, self._hop_len=960
and self._win_len=1920
.
When converting to MATLAB I used:
stft_ch = spectrogram(audio_input(:, ch_cnt), hann(win_len), win_len-hop_len, nfft);
where I verified size(audio_input)=2880000, 4
, win_len=1920
, win_len-hop_len=960
and nfft=2048
.
I am getting an output from MATLAB with size(stft_ch)=1025, 2999
where Python shows stft_ch.shape=(1025, 3001)
. The size 2999
in the MATLAB output is clear and feats the documentation where k = ⌊(Nx – noverlap)/(length(window) – noverlap)⌋
if window is a vector.
However, I could not find in the Python documentation how is the length of t
set.
Why is there a difference between sizes? Is my conversion good?
Is there a Python function which produces an output more similar to MATLAB's spectrogram()
so that I can get the complex output with the same size?
I have found the answer myself.
The MATLAB function spectrogram()
outputs a vector of times which corresponds to the middle of each window while omitting the last window. For example, a 10 samples length signal with a 3 sample window and 1 sample overlap, will result in the following 4 windows:
1:3
,3:5
,5:7
,7:9
, where m:n
represents a window including samples from m
to n
including the n
th sample.
The centers for the windows would, therefore, be: 2,4,6,8
. Note that the 10th sample is not included.
It seems that MATLAB requires the maximal number_of_windows
subjogated to (number_of_windows-1)*hop_length+window_size<=number_of_samples
.
On the python version liberosa.core.stft()
on the other way, t is the time of the first sample for each frame and the frames covers more than the input signal. for example, a 10 samples length signal with a 3 sample window and 2 sample hops (hops and not overlap), will result in the following 4 windows:
1:3
,3:5
,5:7
,7:9
,9:11
, where m:n
represents a window including samples from m
to n
including the n
th sample.
The beginnings for the windows would, therefore, be: 1,3,5,7,9
. Note that the 11th non-existing sample is included.
It seems that liberosa requires the minimal number_of_windows
subjogated to number_of_windows*hop_length>number_of_samples
.
In my case:
(2999-1)960+1920=2880000<=2880000 for MATLAB. 3001960=2880960>2880000 while 30000*960=2880000 !> 2880000 in python.
Note that the times can be centered in Python by setting center=True
flag.
This is the best explanation I could find.