Search code examples
randombooleanbit-manipulationmask

Bitwise convert a random to FP64 with vlues between 0 and 1


I am writing a laboratory data processing script in Excel VBA. The idea is acquiring improved random numbers using RtlGenRandom. The difficulty is that RtlGenRandom seeds the byte array in a byte-wise manner. So for reducing the time and keeping the "randomish" distribution, I want to apply a bitwise mask for reducing the number to [0 to 1] values. Below is a mnemonic explanation what I want to do regardless of exact implementation:

longlong buffer
longlong mask
double rand_val
RtlGenRandom(buffer,8)
buffer=not(buffer or mask)
rand_val=typeless_copy(rand_val, buffer)

So I got a little lost what should be the value of that mask for the said truncation of valuses. Bitwise, or in 0xHex.

I thought, it should be 0xC000 0000 0000 0000, but something is wrong.


Solution

  • Seems you forgot about two details of IEEE-754 floats:

    The mantissa starts with a hard-coded 1

    When humans write numbers in scientific notation (e.g. 3.4567e89) the bold part is called significand or mantissa. Talking superficially about floats, we often say they store an exponent and mantissa; but strictly speaking that is wrong. Beside the exponent they only store the sign and fraction part of the mantissa. Since a mantissa should always start with a single non-zero digit, and floats use the binary system, there is no point in wasting one bit that should always be 1.

    For most cases, we can think of a float as a form document like ...

                 fraction    exponent>0
     (-1)^[] x 1.[][][][] x 2^([][][] - bias)
         sign
    

    ... were you can fill in the [] boxes, but cannot change the 1. at the beginning of the mantissa.

    Therefore, your approach generated numbers from the interval [0, 2).

    Non-uniform distribution

    Due to the representation with an exponent, there are more floats close to zero. E.g. for 64-bit floats (a.k.a. doubles) the interval [0.5, 1.0) contains 2^52 floats, but the interval (0, 0.5) contains 2^(10+52) - 2 floats. Therefore your numbers are way more likely to be close to 0 than to 1.

    A simple solution

    For solving both problems, lets take a step back. Floats can represent 10^-17 = 0.00000000000000001 pretty closely, but not 0.5 + 10^-17 = 0.50000000000000001. It would be unfair if we generated numbers like the former but not numbers like the latter. Therefore, we should evenly split the interval [0, 1] into numbers that can all be represented exactly.

    Since the biggest ULP in the interval [0, 1) is 2^-53 = ca. 10^-16 = 0.0000000000000001, we can divide that interval into 2^53 evenly spaced floats: (0, 1 * 2^-53, 2 * 2^-53, ..., (2^53 - 1) * 2^-53). To pick one of these floats at random, we generate a random 2^53 bit integer and divide it by 2^53:

    ' pseudo code for generating one random Double
    ' from [0,1) with uniform distribution
    LongLong buffer
    RtlGenRandom(buffer, 8)
    buffer = buffer and 0x1FFFFFFFFFFFFF  ' clamp to 53 bits    
    Double rand = buffer / CDbl(0x20000000000000)
    

    If you really need to be able to generate the 1 too, there are 2^53 + 1 possible numbers, which is hard to clamp while maintaining uniform distribution. I believe rejection sampling would be the easiest (and maybe even fastest) solution for your case.

    ' pseudo code for generating one random Double
    ' from [0,1] with uniform distribution
    LongLong buffer
    Double rand
    Do
      RtlGenRandom(buffer, 8)
      buffer = buffer and 0x3FFFFFFFFFFFFF  ' clamp to 54 bits
    Loop While buffer > 0x20000000000000
    If buffer == 0x20000000000000 Then
      rand = 1.0
    Else
      rand = buffer / CDbl(0x20000000000000)
    End If
    

    To speed up both approaches, it's probably best to call RtlGenRandom only once on a very big buffer. If you know that you need N random numbers, you can generate 8*N random bytes for the first approach, or 8*N*1.5 (on average) random bytes for the second one. Since you only need 7 random bytes per LongLong, you could even reduce that total buffer size by 1/8.

    Experiment

    I wrote a small C program (link to online editor) to test that the division actually generates unique floats and there are no x!=y for which x/constant == y/constant due to rounding errors:

    #include <stdio.h>
    #include <stdint.h>
    
    union {
      uint64_t i;
      double f;
    } u;
    
    int main() {
      uint64_t p53 = (1L << 53);
      for (uint64_t magnitude = 2; magnitude <= p53; magnitude <<= 4) {
        for (int offset = -2; offset < 2; ++offset) {
          uint64_t n = magnitude + offset;
          u.f = n / (double) p53;
          printf("%16lu/%16lu.0 = 0x%016lx = %-21.13a = %.17e = %.17f\n",
                 n, p53, u.i, u.f, u.f, u.f);
        }
        printf("%16s/\n", "...");
      }
      return 0;
    }
    

    From its output we can see that our steps are evenly spaced and ...

    • Near 0.0, we are skipping multiple floats in one step
    • But near 1.1, our steps go from one float to its direct neighbor.
    • 1.0 + step == 1.0 again, because our steps are too fine-granular for floats > 1.0.