Search code examples
pythonnumpyaudioscipyspectrum

Why is there a dB difference in the spectrum analysis between Sonic Visualizer and my Python script?


It seems I have an issue in the implementation of a function to create a frequency spectrum from an audio file. I ask this question in the hope someone will find the problem.

You can download the 32bit float WAV audio file here.

I am working on a script which is creating a spectrum analysis from an audio file using SciPy and NumPy. Before I started, I analyzed the file using Sonic Visualizer, which got me the following result:

Sonic Visualizer Result

Now I tried to reproduce this result using my Python Script, but get a different result:

Script Result

Everything looks right, except the scale of the dB values. At 100Hz, Sonic Visualizer is at -40dB and my Script is at -65dB. So I assume, there is a problem in my script converting the FFT result to dBFS.

If I match the curve from Sonic Visualizer to my script's output, it is obvious the conversion of the levels lacks some factor:

Comparison

A minimal version of my script, using the 'demo.wav' file above, looks like this:

from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
from scipy.io import wavfile as wavfile
from scipy.signal import savgol_filter

def db_fft(data, sample_rate):
    data_length = len(data)
    weighting = np.hanning(data_length)
    data = data * weighting
    values = np.fft.rfft(data)
    frequencies = np.fft.rfftfreq(data_length, d=1. / sample_rate)
    s_mag = np.abs(values) * 2 / np.sum(weighting)
    s_dbfs = 20 * np.log10(s_mag)
    return frequencies, s_dbfs

audio_file = Path('demo.wav')
frequency, data = wavfile.read(str(audio_file))
data = data[0:4096]
x_labels, s_dbfs = db_fft(data, frequency)
flat_data = savgol_filter(s_dbfs, 601, 3)
plt.style.use('seaborn-whitegrid')
plt.figure(dpi=150, figsize=(16, 9))
plt.semilogx(x_labels, s_dbfs, alpha=0.4, color='tab:blue', label='Spectrum')
plt.semilogx(x_labels, flat_data, color='tab:blue', label='Spectrum (with filter)')
plt.grid(True)
plt.title(audio_file.name)
plt.ylim([-160, 0])
plt.xlim([10, 10000])
plt.xlabel('Frequency [Hz]')
plt.ylabel('Amplitude [dB]')
plt.grid(True, which="both")
target_name = audio_file.parent / (audio_file.stem + '.png')
plt.savefig(str(target_name))

The script converts the 32bit float audio file into a dBFS spectrum diagram, using the first 4096 samples as the window, as Sonic Visualizer does.

Where is the problem with my script, why do I get a different result?


Solution

  • 1. Different decibels

    The first big difference is that they are using the "power ratio" definition of the decibel, from this Wikipedia page:

    When expressing a power ratio, the number of decibels is ten times its logarithm to base 10.

    I have also verified this in the v4.0.1 source code (in svcore/base/AudioLevel.cpp, line 54)

    double dB = 10 * log10(multiplier);
    

    2. Different magnitude calculation

    They appear simply to divide by the size of the window in the code when calculating the magnitude. This leads to a change of calculation to

    s_mag = np.abs(values) * 2  / data_length 
    

    3. "Corrected" result

    I have not found a way to export their spectrum, but I have manually read off the first few values (note, not the dB value) as

    theirvalues = [
        0.00074, 
        0.000745865, 
        0.00119605, 
        0.0013713, 
        0.0011812, 
        0.000746891, 
        0.000334177,
        0.000163241,
        7.57671e-5,
        3.17983e-5,
        2.91934e-5,
        3.74938e-5
    ]
    

    with the two changes I have mentioned, the graphs compare as follows:

    Comparison graph

    It's still not an exact match, but it's much closer. I suspect there may still be some smoothing of some kind (there are references to hops in the code, but I can't quite suss out what they're doing).