Search code examples
pythonaudiospeechsoxpydub

pydub computes rms differently to sox


I am confused by how pydub computes rms.

In [187]: audio = AudioSegment.from_mp3("sample-mp3")
In [188]: audio.rms
Out[188]: 1041

In [189]: audio.dBFS
Out[189]: -29.959984108983633

However using sox:

$ sox sample.mp3 -n stat
Samples read:         130231296
Length (seconds):   1476.545306
Scaled by:         2147483647.0
Maximum amplitude:     1.000000
Minimum amplitude:    -1.000000
Midline amplitude:    -0.000000
Mean    norm:          0.017384
Mean    amplitude:    -0.000023
**RMS     amplitude:     0.031763**
Maximum delta:         1.308396
Minimum delta:         0.000000
Mean    delta:         0.015841
RMS     delta:         0.028429
Rough   frequency:         6282
Volume adjustment:        1.000

Can anyone enlighten me please on how these rms values are computed?? Thx.


Solution

  • They represent the same value, just on different scales. pydub appears to work with signed 16-bit values (maybe because of the 16-bit depth of the mp3 file?), while SoX by default scales the internal 32-bit signed values to [-1,1]. You can bring the two outputs in to congruency by scaling by 2^15, or by telling SoX to use a signed 16-bit scale by using the -s argument. As 2^31/2^15 is 2^16, that should be -s 65536.