Search code examples
cdoubleunsignedprimitive

C - Unsigned long long to double on 32-bit machine


Hi I have two questions:

  1. uint64_t vs double, which has a higher range limit for covering positive numbers?

  2. How to convert double into uint64_t if only the whole number part of double is needed.

Direct casting apparently doesn't work due to how double is defined.

Sorry for any confusion, I'm talking about the 64bit double in C on a 32bit machine.

As for an example:

//operation for convertion I used:
double sampleRate = (
                      (union { double i; uint64_t sampleRate; })
                      { .i = r23u.outputSampleRate}
                    ).sampleRate;

//the following are printouts on command line
//         double                               uint64_t
//printed by   %.16llx                           %.16llx
outputSampleRate  0x41886a0000000000      0x41886a0000000000 sampleRate

//printed by   %f                                    %llu
outputSampleRate  51200000.000000        4722140757530509312 sampleRate

So the two numbers remain the same bit pattern but when print out as decimals, the uint64_t is totally wrong. Thank you.


Solution

  • uint64_t vs double, which has a higher range limit for covering positive numbers?

    uint64_t, where supported, has 64 value bits, no padding bits, and no sign bit. It can represent all integers between 0 and 264 - 1, inclusive.

    Substantially all modern C implementations represent double in IEEE-754 64-bit binary format, but C does not require nor even endorse that format. It is so common, however, that it is fairly safe to assume that format, and maybe to just put in some compile-time checks against the macros defining FP characteristics. I will assume for the balance of this answer that the C implementation indeed does use that representation.

    IEEE-754 binary double precision provides 53 bits of mantissa, therefore it can represent all integers between 0 and 253 - 1. It is a floating-point format, however, with an 11-bit binary exponent. The largest number it can represent is (253 - 1) * 21023, or nearly 21077. In this sense, double has a much greater range than uint64_t, but the vast majority of integers between 0 and its maximum value cannot be represented exactly as doubles, including almost all of the numbers that can be represented exactly by uint64_t.

    How to convert double into uint64_t if only the whole number part of double is needed

    You can simply assign (conversion is implicit), or you can explicitly cast if you want to make it clear that a conversion takes place:

    double my_double = 1.2345678e48;
    uint64_t my_uint;
    uint64_t my_other_uint;
    
    my_uint = my_double;
    my_other_uint = (uint64_t) my_double;
    

    Any fractional part of the double's value will be truncated. The integer part will be preserved exactly if it is representable as a uint64_t; otherwise, the behavior is undefined.

    The code you presented uses a union to overlay storage of a double and a uint64_t. That's not inherently wrong, but it's not a useful technique for converting between the two types. Casts are C's mechanism for all non-implicit value conversions.