I'm trying to plot the mel-spectrogram of blast fishing sounds. The level of background sound is relatively similar across all recordings.
When I plot files with blasts the background sound shows as quiet, whereas in all files without this its much louder (see examples) - apologies I don't know the right terminology, would appreciate insight on this.
I suspect this is because the blast sound is a loud event which raises the amplitude of the file. How can I standardise this across all spectrograms so the background sound is at a similar amplitude (so the plot on the right looks like the one on the left)? e.g is there a parameter I can extract from a bomb file which I can use as a reference for all others.
# calculate mel features
audio, sr = librosa.load(path=audio_path, sr=sample_rate)
mels = librosa.feature.melspectrogram(y=file_1, sr=sr)
mels_db = librosa.power_to_db(S=mels, ref=1.0)
# plot
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(111)
cax = ax.imshow(mels_db, interpolation='nearest', cmap='coolwarm', origin='lower')
ax.set_title('Mel spec')
plt.show()
Left = bomb, right = no bomb:
The likely issue here is the mapping of values to colors done by imshow
. The default behavior is to automatically map the min and maximum values to 0.0-1.0 of the color map. Which is likely to be different between two separate spectrograms. To set the same intensity mapping, specify vmin
and vmax
in the call to imshow. You can set it to fixed values, or compute them from across all spectrograms you want to compare.
PS: you can get slightly nicer plots for spectrograms by using librosa.display.specshow. In particular, one can easily get the X axis to be time in seconds, and Y axis to be correct frequencies. For it to work correctly, one must specify the spectrogram parameters, such as sr
, hop_length
, et.c..