As part of a research project, I would like to analyze a sound file by generating it's spectrogram.
I have been able to successfully generate the spectrogram of the wave file in matlab with frequency on the y-axis and the time on the x-axis. I would however, like to generate the spectrogram with the frequency on the x-axis and the time on the y-axis. How can this be done?
I have searched through stack and have not found any accepted answers.
My code which generates the spectrogram with the frequency on the y-axis and the time on the x-axis (Matlab code):
[song, fs] = wavread('filename.wav');
windowSize = 256;
windowOverlap = [];
freqRange = 0:300;
spectrogram(song, windowSize, windowOverlap, freqRange, fs, 'yaxis');
I changed the parameter 'yaxis' in the function spectrogram to 'xaxis' and the frequency is now on the x-axis with time on the y-axis. But, I get a spectrogram different from what is generated from a reliable source.
Here is the spectrogram that I generate -
The spectrogram generated from a reliable source (I don't have the code).
Moreover, the color scheme is different in both the spectrograms. And my recording is 50 seconds long whereas the time displayed on the label is 9 seconds. How can I resolve these issues?
My end task is to be able to generate the spectrogram on an android device (probably using the GraphView library in android). So I would have to write code to generate the spectrogram in Java.
Any help on this is greatly appreciated.
Sorry, I don’t have whichever 💸-toolbox-💰 that Mathworks puts spectrogram
in, but here’s some code that I put in the public domain that does the job for me.
It’s more hands-on than spectrogram
but has many of the latter’s features, as I’ll demonstrate using the handel
audio clip that comes with Matlab (‘Hallelujah!’).
I won’t assume you’re familiar with git or Matlab namespaces.
+arf
somewhere in your Matlab path (e.g., ~/Documents/MATLAB
or even your current code directory).stft.m
and put it in +arf/
.partition.m
into +arf/
.This creates an arf
namespace inside which are the arf.stft
and arf.partition
functions (the latter is used by arf.stft
).
clearvars
% Load data: this is an audio clip built into Matlab.
handel = load('handel');
% To hear this audio clip, run the following:
% >> soundsc(handel.y, handel.Fs)
% STFT parameters.
% 1000 samples is roughly 1/8th of a second. A reasonable chunk size.
samplesPerChunk = 1000;
% Overlap a lot between chunks to see a smooth STFT.
overlapSamples = round(samplesPerChunk * 0.9);
% Generate STFT
[stftArr, fVec, tVec] = arf.stft(handel.y, ...
samplesPerChunk, ...
'noverlap', overlapSamples, ...
'fs', handel.Fs);
% Plot results
figure('color', 'white');
imagesc(fVec / 1e3, tVec, 20 * log10(abs(stftArr)).');
axis xy
colorbar
xlabel('frequency (KHz)')
ylabel('time (s)')
caxis(max(caxis) - [40 0])
title('`handel` spectrogram via STFT, top 40 dB')
The code above
handel
audio clip that’s packaged into Matlab (this is a nine-second clip from George Frideric Handel’s Messiah),arf.stft()
, andHint: after you run the code above, or just that load
line, you can listen to the original clip with soundsc(handel.y, handel.Fs)
.
In the spectrogram, you can clearly see the first two long Hallelujah’s, then the two shorter ones, and then finally the last long one. Time runs along the y-axis as you wished.
The code demonstrates how to specify the chunk length (here, 1000 samples, or ≈⅛ seconds) and the amount of overlap (90% of the chunk length, so 900 samples of overlap). Note:
chunk size - 1
.If you just play around with the chunk length, you’ll get a feel for the main knob the STFT gives you to tune. Usually one picks overlap between 25% or 50% of chunk size for reasonably-smooth spectrograms without a huge amount of computational overhead.
N.B. You can increase smoothness along the frequency dimension by passing in an extra argument to arf.stft
, specifically, arf.stft( ..., 'nfft', 2^nextpow2(samplesPerChunk * 8))
. This explicitly sets the number of frequency bins to create (eventually, an FFT of this size is evaluated). The default is equivalent to 2^nextpow2(samplesPerChunk)
, so multiplying it by eight will upsample the spectrum for each chunk eight-fold.