python signal-processing audio-processing

How to compute loudness from audio signal?

I have an audio signal and I want to detect loud moments from it.

The problem I have is that I am not sure if the algorithm / code I present bellow are correct or not.

I read on many posts that the concept of "loudness" is complex and depends on individuals. I also read that it could be somehow approximated using spectrogram, A-weighting and RMS. I'm a novice at audio processing, but based on what I've read, I wrote the following algorithm:

Compute the spectrogram using STFT
Convert it to dB
Apply A-weighting
Compute RMS

The corresponding code I've written using Librosa is:

# Load the input audio
y, sr = librosa.load(path, sr=22050)

# Compute the spectrogram (magnitude)
n_fft = 2048
hop_length = 1024
spec_mag = abs(librosa.stft(y, n_fft=n_fft, hop_length=hop_length))

# Convert the spectrogram into dB
spec_db = librosa.amplitude_to_db(spec_mag)

# Compute A-weighting values
freqs = librosa.fft_frequencies(sr=sr, n_fft=n_fft)
a_weights = librosa.A_weighting(freqs)
a_weights = np.expand_dims(a_weights, axis=1)

# Apply the A-weghting to the spectrogram in dB
spec_dba = spec_db + a_weights

# Compute the "loudness" value
loudness = librosa.feature.rms(S=librosa.db_to_amplitude(spec_dba))

Am I on a good track? Is this algorithm correct? Am I using Librosa correctly?

Thank you for your help.

Solution

i dont know your programming language but the concept seems almost right.

though you do not need to convert to db because of the a-weighting, as the usual a-weighting function is for linear 0-1 already.

for LUFS the usual way is as follows:

for each channel individually,

apply the k-filter (fft followed by a-weighting, but there are also simpler implementations using HP/LP filters)
mean square
in the case of prologic or immersive audio formats: you might also need to substract a few db from the rear and ceiling speakers, for example -1.5 db for the rears in 5.1
sum channels
10log10

this is for history/realtime.

then for LUFS you´d do:

integrate the results (accumulate values over the whole track, while leaving out moments which are below -60db)
in the case of prologic: the LFE channel is not part of the game, you completely ignore it.