Search code examples
pythonsignal-processinglibrosa

What is the conceptual purpose of librosa.amplitude_to_db?


I'm using the librosa library to get and filter spectrograms from audio data.

I mostly understand the math behind generating a spectrogram:

  1. Get signal
  2. window signal
  3. for each window compute Fourier transform
  4. Create matrix whose columns are the transforms
  5. Plot heat map of this matrix

So that's really easy with librosa:

spec = np.abs(librosa.stft(signal, n_fft=len(window), window=window)

Yay! I've got my matrix of FFTs. Now I see this function librosa.amplitude_to_db and I think this is where my ignorance of signal processing starts to show. Here is a snippet I found on Medium:

spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)

Why does the author use this amplitude_to_db function? Why not just plot the output of the STFT directly?


Solution

  • The range of perceivable sound pressure is very wide, from around 20 μPa (micro Pascal) to 20 Pa, a ratio of 1 million. Furthermore the human perception of sound levels is not linear, but better approximated by a logarithm.

    By converting to decibels (dB) the scale becomes logarithmic. This limits the numerical range, to something like 0-120 dB instead. The intensity of colors when this is plotted corresponds more closely to what we hear than if one used a linear scale.

    Note that the reference (0 dB) point in decibels can be chosen freely. The default for librosa.amplitude_to_db is to compute numpy.max, meaning that the max value of the input will be mapped to 0 dB. All other values will then be negative. The function also applies a threshold on the range of sounds, by default 80 dB. So anything lower than -80 dB will be clipped -80 dB.