Search code examples
caudiotype-conversionpulseaudiodecibel

16Bit Little Endian Byte Array to Integer Array to Decibel Value


I'm using the PulseAudio API to get the current microphone input in "realtime". The buffer data is being delivered as a 16bit little endian byte array. What I'd like to do is to find out the maximum peak level in the buffer and transform it into a decibel value. To do that I have to convert each two byte array values into one integer value. In the same loop-process I am also looking for the max value. After that I convert the maximum value into a decibel value. Here is the C code:

static ssize_t loop_write(int fd, const uint8_t *data, size_t size) 
{
int newsize = size / 2;
uint16_t max_value = 0;
int i = 0;

for (i = 0; i < size; i += 2)
{
    // put two bytes into one integer
    uint16_t val = data[i] + ((uint32_t)data[i+1] << 8);

    // find max value
    if(val > max_value)
       max_value = val;
}

// convert to decibel
float decibel = max_value / pow(2, 15);

if(decibel != 0)
    decibel = 20 * log(decibel);

// print result
printf("%f, ", decibel);

return size;
}

To my knowledge the amplitude value should be between 0 and 32768 for PA_SAMPLE_S16LE. But I am getting values between 0 and 65536 before the decibel conversion. Is there anything wrong with my conversion?

For the sake of completeness I am also posting my pulseaudio setup:

int main(int argc, char*argv[]) 
{
char *device = "alsa_input.usb-041e_30d3_121023000184-00-U0x41e0x30d3.analog-mono";

// The sample type to use
static const pa_sample_spec ss = {
    .format = PA_SAMPLE_S16LE,
    .rate = 44100,
    .channels = 1
};
pa_simple *s = NULL;
int ret = 1;
int error;

// Create the recording stream 
if (!(s = pa_simple_new(NULL, argv[0], PA_STREAM_RECORD, device, "record", &ss, NULL, NULL, &error))) {
    fprintf(stderr, __FILE__": pa_simple_new() failed: %s\n", pa_strerror(error));
    goto finish;
}

for (;;) {
    uint8_t buf[BUFSIZE];

    // Record some data ...
    if (pa_simple_read(s, buf, sizeof(buf), &error) < 0) {
        fprintf(stderr, __FILE__": pa_simple_read() failed: %s\n", pa_strerror(error));
        goto finish;
    }

    // And write it to STDOUT
    if (loop_write(STDOUT_FILENO, buf, sizeof(buf)) != sizeof(buf)) {
        fprintf(stderr, __FILE__": write() failed: %s\n", strerror(errno));
        goto finish;
    }
}

ret = 0;

finish:

if (s)
    pa_simple_free(s);

return 0;
}

Solution

  • What I'd like to do is to find out the maximum peak level in the buffer and transform it into a decibel value.

    From a physical point of view this approach doesn't makes sense. While it is possible to specify single sample values in relation to the full dynamic range, you're probably more interested in the sound level, i.e. the power of the signal. A single peak, even if it's full scale carries only very little energy; it may cause a very loud popping noise, due to harmonic distortion and limited bandwidth, but technically its power density is spread out over the whole band limited spectrum.

    What you really should to is determining the RMS value (root mean square). I.e.

    RMS = sqrt( sum( square(samples) )/n_samples )
    

    EDIT: Note that the above is only correct for signals without a DC part. Most analog sound interfaces are AC coupled, so this is not a problem. But if there's a DC part as well, you must first subtract the mean value from the samples, i.e.

    RMS_DC_reject = sqrt( sum( square(samples - mean_sample) )/n_samples )
    

    I'll leave it as an exercise for the reader to add this to the code below.

    This gives you the power of the samples processed, which is what you actually want. You asked about deciBels. Now I have to ask you dB(what)? You need reference value since Bels (or deciBels) is a relative (i.e. comparative) measure. For a digital signal full scale would be 0 dB(FS) and zero line would be -20 log10( 2^B ), where B = sampling bit depth. For a 16 bit signal about -96 dB(FS).

    If we're talking about signal on the line, a common reference is a power 1 mW, in that case the scale is dB(m). For audio line level it has been defined that full scale equals to 1 mW of signal power, which is what 1V RMS dissipate over a 1 kOhm resistor (There you have the RMS again).

    Now since our full scale is immediately determined by the input circuitry, which is defined in terms of dB(m), you can later display dB(FS) as dB(m) (or dBm) just fine.

    When it comes to the actual sound level, well, this depends on your input amplifier gain, and the conversion efficiency of the microphone used.


    To my knowledge the amplitude value should be between 0 and 32768 for PA_SAMPLE_S16LE. But I am getting values between 0 and 65536 before the decibel conversion. Is there anything wrong with my conversion?

    You asked about a signed integer format. But you're casting the values into an unsigned int. And since dB_FS is relative to the full scale, don't divide it by the number of bits. For a zero signal of a 16 bit the outcome should be about -96 dB. The division makes no sense anyway, as it merely scales your RMS into the range [0; 1], but log(0) diverges to -infinity. Hence your if statement. But remember, this is physics, and physics is continuous, there should be no if statement here.

    You should write it like this

    // even for signed values this should be 2^N
    // we're going to deal with signed later
    double const MAX_SIGNAL = 1 << SAMPLE_BITS;
    
    // using double here, because float offers only 25 bits of
    // distortion free dynamic range.
    double accum = 0;
    int const n_samples = size/2;
    for (i = 0; i < size; i += 2)
    {
        // put two bytes into one __signed__ integer
        int16_t val = data[i] + ((int16_t)data[i+1] << 8);
    
        accum += val*val;
    }
    accum /= n_samples;
    
    // Since we're using signed values we need to
    // double the accumulation; of course this could be
    // contracted into the statement above
    accum *= 2.;
    
    float const dB_FS = -20 * log10( MAX_SIGNAL - sqrt(accum) );