Search code examples
cbit-manipulation

Bitwise absolute value of float/double in C (decimals lost during cast)


I'm tying to compare different ways to get the absolute value of a float/double to find out which one's the fastest because I'll then have to apply this to huge arrays. By using a cast and a bit mask the decimals get lost during the process. (I must use only C)

Here's my code :

uint64_t mask = 0x7fffffffffffffff;
double d1 = -012301923.15126;
double d2 = (double)(((uint64_t)d1) & mask);

And the output is :

d1 = -012301923.15126;
d2 = 012301923.00000;

So the decimals are lost during the conversion, is there a fast way to get them back ?

Thanks in advance.

Edit : I know about fabs(), i'd just like to try and compare different "handmade" solutions.


Solution

  • That's because your cast converts the floating point number to an integer number, which means the decimals are truncated.

    What you have is roughly equivalent to

    uint64_t temp = (uint64_t) d1;
    temp &= mask;
    d2 = temp;
    

    You could solve it with type punning using a union in between:

    union
    {
        uint64_t i;
        double   d;
    } u;
    
    u.d = d1;
    u.i &= mask;
    d2 = u.d;
    

    As noted by Bathsheba this will in practice work with the big C++ compilers as well. But the C specification explicitly says this is allowed, while the C++ specification says it's undefined (IIRC).