Bitwise convert a random to FP64 with vlues between 0 and 1

I am writing a laboratory data processing script in Excel VBA. The idea is acquiring improved random numbers using RtlGenRandom. The difficulty is that RtlGenRandom seeds the byte array in a byte-wise manner. So for reducing the time and keeping the "randomish" distribution, I want to apply a bitwise mask for reducing the number to [0 to 1] values. Below is a mnemonic explanation what I want to do regardless of exact implementation:

longlong buffer
longlong mask
double rand_val
RtlGenRandom(buffer,8)
buffer=not(buffer or mask)
rand_val=typeless_copy(rand_val, buffer)

So I got a little lost what should be the value of that mask for the said truncation of valuses. Bitwise, or in 0xHex.

I thought, it should be 0xC000 0000 0000 0000, but something is wrong.

Solution

Seems you forgot about two details of IEEE-754 floats:

The mantissa starts with a hard-coded 1

When humans write numbers in scientific notation (e.g. 3.4567e89) the bold part is called significand or mantissa. Talking superficially about floats, we often say they store an exponent and mantissa; but strictly speaking that is wrong. Beside the exponent they only store the sign and fraction part of the mantissa. Since a mantissa should always start with a single non-zero digit, and floats use the binary system, there is no point in wasting one bit that should always be 1.

For most cases, we can think of a float as a form document like ...

             fraction    exponent>0
 (-1)^[] x 1.[][][][] x 2^([][][] - bias)
     sign

... were you can fill in the [] boxes, but cannot change the 1. at the beginning of the mantissa.

Therefore, your approach generated numbers from the interval [0, 2).

Non-uniform distribution

Due to the representation with an exponent, there are more floats close to zero. E.g. for 64-bit floats (a.k.a. doubles) the interval [0.5, 1.0) contains 2^52 floats, but the interval (0, 0.5) contains 2^(10+52) - 2 floats. Therefore your numbers are way more likely to be close to 0 than to 1.

A simple solution

For solving both problems, lets take a step back. Floats can represent 10^-17 = 0.00000000000000001 pretty closely, but not 0.5 + 10^-17 = 0.50000000000000001. It would be unfair if we generated numbers like the former but not numbers like the latter. Therefore, we should evenly split the interval [0, 1] into numbers that can all be represented exactly.

Since the biggest ULP in the interval [0, 1) is 2^-53 = ca. 10^-16 = 0.0000000000000001, we can divide that interval into 2^53 evenly spaced floats: (0, 1 * 2^-53, 2 * 2^-53, ..., (2^53 - 1) * 2^-53). To pick one of these floats at random, we generate a random 2^53 bit integer and divide it by 2^53:

' pseudo code for generating one random Double
' from [0,1) with uniform distribution
LongLong buffer
RtlGenRandom(buffer, 8)
buffer = buffer and 0x1FFFFFFFFFFFFF  ' clamp to 53 bits    
Double rand = buffer / CDbl(0x20000000000000)

If you really need to be able to generate the 1 too, there are 2^53 + 1 possible numbers, which is hard to clamp while maintaining uniform distribution. I believe rejection sampling would be the easiest (and maybe even fastest) solution for your case.

' pseudo code for generating one random Double
' from [0,1] with uniform distribution
LongLong buffer
Double rand
Do
  RtlGenRandom(buffer, 8)
  buffer = buffer and 0x3FFFFFFFFFFFFF  ' clamp to 54 bits
Loop While buffer > 0x20000000000000
If buffer == 0x20000000000000 Then
  rand = 1.0
Else
  rand = buffer / CDbl(0x20000000000000)
End If

To speed up both approaches, it's probably best to call RtlGenRandom only once on a very big buffer. If you know that you need N random numbers, you can generate 8*N random bytes for the first approach, or 8*N*1.5 (on average) random bytes for the second one. Since you only need 7 random bytes per LongLong, you could even reduce that total buffer size by 1/8.

Experiment

I wrote a small C program (link to online editor) to test that the division actually generates unique floats and there are no x!=y for which x/constant == y/constant due to rounding errors:

#include <stdio.h>
#include <stdint.h>

union {
  uint64_t i;
  double f;
} u;

int main() {
  uint64_t p53 = (1L << 53);
  for (uint64_t magnitude = 2; magnitude <= p53; magnitude <<= 4) {
    for (int offset = -2; offset < 2; ++offset) {
      uint64_t n = magnitude + offset;
      u.f = n / (double) p53;
      printf("%16lu/%16lu.0 = 0x%016lx = %-21.13a = %.17e = %.17f\n",
             n, p53, u.i, u.f, u.f, u.f);
    }
    printf("%16s/\n", "...");
  }
  return 0;
}

From its output we can see that our steps are evenly spaced and ...

Near 0.0, we are skipping multiple floats in one step
But near 1.1, our steps go from one float to its direct neighbor.
1.0 + step == 1.0 again, because our steps are too fine-granular for floats > 1.0.