Search code examples
librosapydub

How to get similar results to pydub.silence.detect_nonsilent() using librosa.effects.split()?


I love pydub. It is simple to understand. But when it comes to detecting non-silent chunks, librosa seems much faster. So I want to try using librosa in a project to speed my code up.

So far, I have been using pydub like this (segment is an AudioSegment):

thresh = segment.dBFS - (segment.max_dBFS - segment.dBFS)
non_silent_ranges = pydub.silence.detect_nonsilent(segment, min_silence_len=1000, silence_thresh=thresh)

The thresh formula works mostly well, and when it does not, moving it a 5 or so dbs up or down does the trick.

Using librosa, I am trying this (y is a numpy array loaded with librosa.load(), with an sr of 22050)

non_silent_ranges = librosa.effects.split(y, frame_length=sr, top_db=mistery)

To get similar results to pydub I tried setting mistery to the following:

mistery = y.mean() - (y.max() - y.mean())

and the same after converting y to dbs:

ydbs = librosa.amplitude_to_db(y)
mistery = ydbs.mean() - (ydbs.max() - ydbs.mean())

In both cases, the results are very different from what get from pydub.

I have no background in audio processing and although I read about rms, dbFS, etc, I just don't get it--I guess I am getting old:)

Could somebody point me in the right direction? What would be the equivalent of my pydub solution in librosa? Or at least, explain to me how to get the max_dBFS and dBFS values of pydub in librosa (I am aware of how to convert and AudioSegment to the equivalent librosa numpy array thanks to the excellent answer here)?


Solution

  • max_dBFS is always 0 by it's nature. dBFS is how much "quieter" the sound is than the max possible signal.

    I suspect another part of your issue is that ydbs.max() is the maximum value among data in ydbs, not the maximum possible value that can be stored (i.e., the highest integer or float possible)

    Another difference from pydub is your use of ydbs.mean(), pydub uses RMS when computing dBFS.

    You can convert ydbs.mean() to dbfs like so:

    from numpy import mean, sqrt, square, iinfo
    
    max_sample_value = iinfo(ydbs.dtype).max
    ydbs_rms = sqrt(mean(square(ydbs))
    
    ydbs_dbfs = 20 * log(ydbs_rms) / max_sample_value, 10)