Hi I have two questions:
uint64_t vs double, which has a higher range limit for covering positive numbers?
How to convert double into uint64_t if only the whole number part of double is needed.
Direct casting apparently doesn't work due to how double is defined.
Sorry for any confusion, I'm talking about the 64bit double in C on a 32bit machine.
As for an example:
//operation for convertion I used:
double sampleRate = (
(union { double i; uint64_t sampleRate; })
{ .i = r23u.outputSampleRate}
).sampleRate;
//the following are printouts on command line
// double uint64_t
//printed by %.16llx %.16llx
outputSampleRate 0x41886a0000000000 0x41886a0000000000 sampleRate
//printed by %f %llu
outputSampleRate 51200000.000000 4722140757530509312 sampleRate
So the two numbers remain the same bit pattern but when print out as decimals, the uint64_t is totally wrong. Thank you.
uint64_t vs double, which has a higher range limit for covering positive numbers?
uint64_t
, where supported, has 64 value bits, no padding bits, and no sign bit. It can represent all integers between 0 and 264 - 1, inclusive.
Substantially all modern C implementations represent double
in IEEE-754 64-bit binary format, but C does not require nor even endorse that format. It is so common, however, that it is fairly safe to assume that format, and maybe to just put in some compile-time checks against the macros defining FP characteristics. I will assume for the balance of this answer that the C implementation indeed does use that representation.
IEEE-754 binary double precision provides 53 bits of mantissa, therefore it can represent all integers between 0 and 253 - 1. It is a floating-point format, however, with an 11-bit binary exponent. The largest number it can represent is (253 - 1) * 21023, or nearly 21077. In this sense, double
has a much greater range than uint64_t
, but the vast majority of integers between 0 and its maximum value cannot be represented exactly as double
s, including almost all of the numbers that can be represented exactly by uint64_t
.
How to convert double into uint64_t if only the whole number part of double is needed
You can simply assign (conversion is implicit), or you can explicitly cast if you want to make it clear that a conversion takes place:
double my_double = 1.2345678e48;
uint64_t my_uint;
uint64_t my_other_uint;
my_uint = my_double;
my_other_uint = (uint64_t) my_double;
Any fractional part of the double
's value will be truncated. The integer part will be preserved exactly if it is representable as a uint64_t
; otherwise, the behavior is undefined.
The code you presented uses a union to overlay storage of a double
and a uint64_t
. That's not inherently wrong, but it's not a useful technique for converting between the two types. Casts are C's mechanism for all non-implicit value conversions.