Search code examples
c++cfloating-pointinteger24-bit

C/C++ - Convert 24-bit signed integer to float


I'm programming in C++. I need to convert a 24-bit signed integer (stored in a 3-byte array) to float (normalizing to [-1.0,1.0]).

The platform is MSVC++ on x86 (which means the input is little-endian).

I tried this:

float convert(const unsigned char* src)
{
    int i = src[2];
    i = (i << 8) | src[1];
    i = (i << 8) | src[0];

    const float Q = 2.0 / ((1 << 24) - 1.0);

    return (i + 0.5) * Q;
}

I'm not entirely sure, but it seems the results I'm getting from this code are incorrect. So, is my code wrong and if so, why?


Solution

  • You are not sign extending the 24 bits into an integer; the upper bits will always be zero. This code will work no matter what your int size is:

    if (i & 0x800000)
        i |= ~0xffffff;
    

    Edit: Problem 2 is your scaling constant. In simple terms, you want to multiply by the new maximum and divide by the old maximum, assuming that 0 remains at 0.0 after conversion.

    const float Q = 1.0 / 0x7fffff;
    

    Finally, why are you adding 0.5 in the final conversion? I could understand if you were trying to round to an integer value, but you're going the other direction.

    Edit 2: The source you point to has a very detailed rationale for your choices. Not the way I would have chosen, but perfectly defensible nonetheless. My advice for the multiplier still holds, but the maximum is different because of the 0.5 added factor:

    const float Q = 1.0 / (0x7fffff + 0.5);
    

    Because the positive and negative magnitudes are the same after the addition, this should scale both directions correctly.