Search code examples
javafloating-pointdoubleprecisionieee-754

Java float to double - upper and lower bounds?


As most here will know, double -> float incurs a loss in precision. This means, multiple double values may be mapped to the same float value. But how do I go the other way? Given a normal (I'm not caring about the extreme cases) float, how do I find the upper and lower value of double precision that are still mapped to the same float?

Or, to speak in code:

function boolean testInterval(float lowF, float highF, double queryD) {
    float queryF = (float) queryD;
    return (lowF <= queryF) && (queryF <= highF);
}

and

function boolean testInterval(float lowF, float highF, double queryD) {
    double lowD = (double) lowF;
    double highD = (double) highF;
    return (lowD <= queryD) && (queryD <= highD);
}

do not always give the same result. I'm looking for two functions float-> double to make the second function return the same result at the first.

This could work, but it looks like a hack and not the proper solution to me.

function boolean testIntervalHack(float lowF, float highF, double queryD) {
    double lowD = (double) lowF - Float.MIN_VALUE;
    double highD = (double) highF + Float.MIN_VALUE;
    return (lowD <= queryD) && (queryD <= highD);
}

Solution

  • Your testIntervalHack doesn't work, the range of double values mapping to the same float varies. For example, with x = 2^24-1, every double between x-0.5 and x+0.5 will be mapped to the same value (the float value of x), but x +/- Float.MIN_VALUE == x.

    I'm not aware of any convenient API methods, so the best I can offer is

    1. convert to double
    2. convert the double to the bit representation via doubleTo(Raw)LongBits
    3. add or subtract one of 228 or 228-1, depending on whether you want the upper or lower bound and the 229-bit is 0 or 1 (because of round-to-even)
    4. convert that long to double via longBitsToDouble

    Well, that's for finite values in float range. For NaNs, you can stop after step 1., for infinities, it's a bit more delicate, since double values larger than or equal to 2128-2103 are converted to (float)Infinity, which is quite a bit away from the bit representation of (double)Infinity.