Search code examples
javac#doubleieee-754

Understanding IEEE-754 64-bit fixed point representation in C# and Java


Consider the following Java code:

public class Program {
    public static void main(String args[]) {
      double number = Double.MAX_VALUE;
      String formattedNumber = String.format("%f", number);
      System.out.println(formattedNumber);
    }
}

179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000

Consider the equivalent C# code:

public class Program
{
    public static void Main(string[] args)
    {
        double value = double.MaxValue;
        Console.WriteLine(value.ToString("F"));
    }
}

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000

Given that the maximum value of Double is 1.7976931348623157E+308, to my knowledge, the Java output is correct; i.e. the floating-point value actually represents an integer where the first 17 digits are 17976931348623157, followed by 292 zeros.

Note: Converting double to BigInteger in C# yields the same result:

BigInteger value = (BigInteger)double.MaxValue;
Console.WriteLine(value);

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

Questions

  • Why are the values wildly different, and which one should be considered correct?
  • If C# is actually incorrect, how would I obtain the correct, or identical output that Java produced?

Solution

  • Given that the maximum value of Double is 1.7976931348623157E+308…

    This is incorrect. The format used for Double is the binary64 or “double precision” format specified in the IEEE 754-2019 standard. In this format, a finite number is represented as ±f•2e, where f, the fraction portion of the representation, is the number represented by a binary numeral with one digit (0 or 1) before the radix point and 52 bits after it and e, the exponent, is an integer in [−1022, 1023]. So the maximum representable finite value is +1.11111111111111111111111111111111111111111111111111112•21023, which equals +(21−2−52)•21023 = 21024−2971, which is exactly 179,769,313,486,231,570,814,527,423,731,704,356,798,070,567,525,844,996,598,917,476,803,157,260,780,028,538,760,589,558,632,766,878,171,540,458,953,514,382,464,234,321,326,889,464,182,768,467,546,703,537,516,986,049,910,576,551,282,076,245,490,090,389,328,944,075,868,508,455,133,942,304,583,236,903,222,948,165,808,559,332,123,348,274,797,826,204,144,723,168,738,177,180,919,299,881,250,404,026,184,124,858,368, so the C# output is correct.

    The Java output elides significant digits and conceals the true value. This is because the Java specification says that for the default formatting “There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double.”