Converting u64 to f64 between 0..1

I need a very fast pseudorandom number generator for a project I've been working on. So far, I've implemented the xorshift algorithm and can generate pseudorandom u64s. However, I need to convert these u64s to a floating point value in the range between 0 and 1.

I've been primarily using this and this as reference.

For some reason I can't get anywhere close to the behavior I want; this is confusing to me as I've used the exact same method as found here. Despite me seeing no difference in the implementations, I get different results.

    let seeds: [u64; 64] = core::array::from_fn(|i| i as u64);

    let bitshift12 = u64x64::splat(12);
    let bitshift25 = u64x64::splat(25);
    let bitshift27 = u64x64::splat(27);
    
    let bitshift52 = u64x64::splat(52);
    
    let mut random_states = Simd::from(seeds);
    
    random_states ^= random_states >> bitshift12;
    random_states ^= random_states << bitshift25;
    random_states ^= random_states >> bitshift27;
    
    random_states = random_states | ((u64x64::splat(1023) + u64x64::splat(0)) << bitshift52);
    
    let mut generated = Simd::<f64, 64>::from_bits(random_states);
    
    println!("{:?}", generated);

Output:

[1.0, 1.0000000074505808, 1.0000000149011616, 1.0000000223517425, 1.0000000298023235, 1.0000000372529039, ...]

Obviously I didn't do something properly as the last few decimals are "random" as needed. Why can't I properly shift these up?

If anyone point out my error(s) it would be much appreciated.

Solution

That sequence looks like what you'd get from stuffing small integers into the mantissas of f64 bit-patterns with the exponent from 1.0, so you get 1.0 plus tiny amounts. Not as small as 0, 1, 2, 3, ...; https://www.binaryconvert.com/result_double.html?decimal=049046048048048048048048048048055052053048053056048056 shows that number is represented by the f64 bit-pattern 0x3FF0000002000001 which just has 2 bits set in the mantissa.

That looks like the bit-pattern you'd get after iteration of xoshiro starting with seed = 1, though. Note the first shift is to the right, shifting out the only bit leaving 0. The next step is left, leading to two set bits. Then the final right-shift by 27 shifts them both out, again xoring with 0 leaving them unmodified.

So your extremely non-random seed of seeds[i] = i after just one step of xoshiro leads to these non-random mantissas. (And seeds[0] will never become non-zero; xoshiro needs non-zero seeds because shift and xor can never create a non-zero bit from zero.)

If you did have uniform random u64 values (e.g. from using a real seed, or letting the generator run many iterations for the non-zero seeds), ORing them with the exponent from 1.0 would randomize the exponent as well, leading to huge values. But always larger in magnitude than 1.0, except for NaN with all-ones exponent (or infinity if the mantissa was zero). Also a random sign. OR can't clear bits, and IEEE float magnitude increases monotonically with increasing integer bit-pattern, thanks to the exponent bias. https://en.wikipedia.org/wiki/Double-precision_floating-point_format

If you mask the random u64 to keep only the low 52 bits so you're just randomizing the mantissa, you can easily get uniform random numbers in [1.0, 2.0). As Chux says, in the Q&A you linked (How to convert a number of pseudorandom bits into a statistically random floating-point value between 0 and 1?), subtracting 1.0 from that is a standard way to get a [0.0, 1.0) number.

The closer to 0.0 (the smaller the exponent), the more trailing zeros the mantissa will have after cancellation from subtracting two nearby numbers: the smaller the exponent, the closer together the representable values are, but we want a uniform distribution. There are only 52 bits of entropy from this method. This is probably fine, but in theory you could check the exponent field and use a variable-count shift + OR to randomize the low mantissa bits.

Chux's other method (of value-preserving conversion, like a C cast) and then division (actually multiplication by an inverse) can't be done as efficiently on x86 without AVX-512 for packed conversion from u64 to f64. How to efficiently perform double/int64 conversions with SSE/AVX? - it takes multiple instructions, more than for replacing the exponent and subtracting. (And with AVX-512, replacing the exponent field also becomes more efficient, just a single vpternlogd with a bitmask covering the exponent+sign field.)

BTW, this using SIMD vectors of shift counts doesn't look efficient unless the compiler optimizes u64x64 bitshift12 back to a scalar immediate. On x86 and AArch64 at least, vector shifts can use a scalar count, so I'd hope that random_states >> 12 would compile to vpsrlq ymm, ymm, 12 (with AVX2), not needing AVX2 variable-count shifts with vector constants for the counts like vpsrlvq ymm, ymm, ymm. (One per 2 cycle throughput on Zen 2 vs. one per cycle for immediate-count shifts: https://uops.info/. But on Zen 3 and later, and Intel Skylake and later, throughputs are the same. But if the compiler is actually having to load vectors of counts from an array of 64x u64, that sucks).

I assume u64x64::splat(1023) + u64x64::splat(0) was there to play with different exponent fields, but why vector add? Just u64x64::splat((1023 + offset) << 52) will give you the exponent field from 1.0, doing all math with scalar constants that won't even tempt the compiler into doing it at run-time.