Search code examples
javarandomfloating-pointdoubleieee-754

Determining the Upper Limit of Math.random() Output in Java


I'm seeking clarification on the upper limit of the numbers generated by the Math.random() method in Java. While I understand that the range of this method is "greater than or equal to 0.0 and less than 1.0", I'm interested in precisely determining the largest number that can be generated by the method.

Through experimentation with doubles using the IEEE 754 Converter tool, I've found that the largest number less than 1.0 represented by a double appears to be approximately 0.999999940395355224609375. However, this conclusion is based solely on tinkering.

Could someone confirm whether this is indeed the upper limit of Math.random() output? Additionally, I would appreciate insights or a method to formally prove that this value represents the largest number smaller than 1.0 in IEEE 754 floating-point representation.


Solution

  • I would appreciate insights or a method to formally prove that this value represents the largest number smaller than 1.0 in IEEE 754 floating-point representation.

    Java as a word means many things. In many ways its 'just' a spec and not an implementation. So, how does one prove that the Math.random() function does that? Well, depends on what you mean by java, but mostly, not possible, and we need to delve into what 'up to but not including 1.0' means. Fortunately, the actual JDK21 source's javadoc explicitly includes this API note:

         * @apiNote
         * As the largest {@code double} value less than {@code 1.0}
         * is {@code Math.nextDown(1.0)}, a value {@code x} in the closed range
         * {@code [x1,x2]} where {@code x1<=x2} may be defined by the statements
    
    

    Which rather suggests that, indeed, the largest value that it can generate is Math.nextDown(1.0), locking down what 'up to but not including 1.0' means. Unfortunately, api notes aren't part of the spec. This is merely suggestive.

    So, we need to waffle on two points before we get into the meat and potatoes of it:

    • Java does not actually quite promise perfect IEEE754 math; it's why the keyword strictfp exists, but as far as I'm aware it does nothing on any modern combination of architecture/OS/JVM-release you are ever likely to run into, and it's all IEEE754. Note that all these library methods such as Math.random() are not strictfp. So, it's nice that this doesn't matter. Still, if you're asking 'how do I formally prove the java spec guarantees that the largest number Math.random() could possibly generate on a spec-conforming implementation is that IEEE754 double number that is the largest amongst all numbers that are less than 1.0?' the answer is you cannot do that. We must presume 'a JDK that uses actual IEEE754 math'.

    And while we're adding caveats, let's just dodge the whole 'API docs, what do these words mean' issue and just go straight for: Let's just analyse what the current code does: java.util.Random source code as of 2024-03-07:

    Line 697-700:

      @Override
        public double nextDouble() {
            return (((long)(next(Double.PRECISION - 27)) << 27) + next(27)) * DOUBLE_UNIT;
        }
    

    That's the key code we need to analyse. Note:

    • next(int) returns a sequence of random bits - the int parameter says how many bits to return. e.g. next(2) returns with equal probability: 0, 1, 2, or 3.
    • Double.PRECISION is, as one might expect, '53' - it represents the number of bits in an IEEE754 double number dedicated to the significand. See wikipedia on IEEE754.

    That whole code boils down to a much simpler to understand:

    return next(53) * DOUBLE_UNIT;
    

    So why is it doing this weirdo song and dance routine where it generates 26 random bits, shifts them left by 27, then generates 27 more random bits? Because next returns an int, so they have to make 2 calls, and the code decided to just divide 53 down the middle. It's a distraction that has no effect on the math, so, it's JUST: Generate 'significand' amount of random bits, then, multiply that by DOUBLE_UNIT, which is:

        private static final double DOUBLE_UNIT = 0x1.0p-53; // 1.0 / (1L << 53)
    

    Let's now posit:

    • We trust next(int) - in that we trust that it does what it says and each bit it returns is indeed exactly as likely to be a '1' bit as it is to be a '0' bit with no dependencies (the bit generated now has zero predictive value to attempt to predict the bit it will generate next).

    Then, we have a number in long form between 0 and 2^53-1, uniformly distributed. We now multiply that number by the double 1.0/2^53). In basis if we forget about IEEE754 and all that, this indeed gets us a number between 0 and 1, specifically, there are 2^53 different options, ranging from 0/2^53, i.e. 0, all the way up to at most (2^53-1)/2^53 which is very very slightly smaller than 1.0 - exactly 1/2^53 smaller than it.

    Now bring IEEE754 back in: Java will silently convert that long to a double before doing the math (because the only bytecode that can do it is DMUL, and you can write this code, let javac compile it, and run javap to check: Indeed, the opcode generated for this is L2D (long 2 double), followed by DMUL (multiply 2 doubles together).

    Is that 'safe'? IEEE754 says yes: The significand is 53 bits, so a number that can be expressed in 53 bits will 'fit' without any loss of conversion. Will the bytecode L2D do it without loss? I do not know how to 'formally' prove such a thing, if java is just a bunch of spec docs. The docs suggest it, and all the JVM implementations I can get my hands on do this as you'd expect.

    We now get to DMUL and again we're stuck and can't formally prove anything, because the definition of what DMUL does is Java Virtual Machine Specification §6.5.dmul - and that is a bunch of english text. But the spec does say that it is 'governed by the rules of IEEE 754 arithmetic', and refers to §2.8 if rounding comes up (it should not come up as per IEEE 754 rules).

    We need to make a spec-to-formality leap here and read up on what IEEE 754 defines, and trust that our JVM does precisely what IEEE754 says. Which is not how it works: DMUL will pretty much just ask the CPU to do it; the JVM implementation fobs off the requirement to follow IEEE754 to the underlying architecture. That's why strictfp exists: If that fob-off is non-precise because the underlying architecture is known not to actually do IEEE754 math, the JVM needs to implement DMUL with something that is hundreds of times slower by laboriously applying IEEE754 itself instead of using the CPU's opcode to do it, and that's just not worth it, so strictfp exists as safety valve: A JVM will just ask the CPU to do it, unless you use strictfp in which case if the CPU cannot be trusted to get it exactly right, then it won't. Except either all CPUs get it exactly right so strictfp does nothing, or the OpenJDK impls don't care and strictfp does nothing because nobody could be arsed. I cannot help you on this part, nor can the spec.

    We now need to delve into what Intel, AMD, apple's chip design team, amazon's design team (who also have ARM-based chips that JVMs run on), and so forth promise about their implementation of their CPUs.

    But, assuming it's "we follow IEEE754, exactly", we can look at what IEEE754 dictates, and that's easy: A long that starts out having the top 11 bits (64-53 = 11) as zeroes converts to a double with no loss, and such a double can be divided by DOUBLE_UNIT (1/2^53) with no loss.

    Let's 'fake' it out and write this code, FORCING the generation of this largest possible value:

    import java.util.*;
    
    class Test {
      public static void main(String[] args) {
        Random r = new Random() {
          protected int next(int bits) {
            // return all 1-bits
            int x = 1 << bits;
            return x - 1;
          }
        };
        double v = r.nextDouble();
        System.out.printf("%f -> %X equal to nextDown? %s %s\n",
           v, Double.doubleToLongBits(v),
           v == Math.nextDown(1.0) ? "YES!" : "NO",
           v == 1.0 ? "1-exactly" : "not-1");
      }
    }
    

    This dutifully reports, on my JVM/arch/OS combo anyway:

    1,000000 -> 3FEFFFFFFFFFFFFF equal to nextDown? YES! not-1
    

    Note that it rounded Math.nextDown(1.0) to 1.0 in its %f, but, it's not 1.0, as the string print shows. We're witnessing the fact that %f with no specifiers will apply a light bit of rounding, not anything fundamental such as witnessing that Math.random() can generate actual literal 1.0.