Search code examples
ccastingcharunsignedshort

Precisly convert float 32 to unsigned short or unsigned char


First of all sorry if this is a duplicate, I couldn't find any subject answering my question.

I'm coding a little program that will be used to convert 32-bit floating point values to short int (16 bits) and unsigned char (8 bits) values. This is for HDR images purpose.

From here I could get the following function (without clamping):

static inline uint8_t u8fromfloat(float x)
{
    return (int)(x * 255.0f);
}

I suppose that in the same way we could get short int by multiplying by (pow( 2,16 ) -1)

But then I ended up thinking about ordered dithering and especially to Bayer dithering. To convert to uint8_t I suppose I could use a 4x4 matrix and a 8x8 matrix for unsigned short.

I also thought of a Look-up table to speed-up the process, this way:

uint16_t LUT[0x10000] // 2¹⁶ values contained

and store 2^16 unsigned short values corresponding to a float. This same table could be then used for uint8_t as well because of the implicit cast between unsigned short ↔ unsigned int

But wouldn't a look-up table like this be huge in memory? Also how would one fill a table like this?!

Now I'm confused, what would be best according to you?

EDIT after uwind answer: Let's say now that I also want to do basic color space conversion at the same time, that is before converting to U8/U16 , do a color space conversion (in float), and then shrink it to U8/U16. Wouldn't in that case use a LUT be more efficient? And yeah I would still have the problem to index the LUT.


Solution

  • The way I see it, the look-up table won't help since in order to index into it, you need to convert the float into some integer type. Catch 22.

    The table would require 0x10000 * sizeof (uint16_t) bytes, which is 128 KB. Not a lot by modern standards, but on the other hand cache is precious. But, as I said, the table doesn't add much to the solution since you need to convert float to integer in order to index.

    You could do a table indexed by the raw bits of the float re-interpreted as integer, but that would have to be 32 bits which becomes very large (8 GB or so).

    Go for the straight-forward runtime conversion you outlined.