Search code examples
c++randomfloating-pointprecisionuniform-distribution

uniform_real_distribution<float> all possible values generation


I am currently working on importance sampling, and for testing purposes I need to be able to generate all possible values that uniform_real_distribution<float> may generate for the interval [0,1] (yes it is closed from the right too). My idea was to generate integer numbers which I can then convert to floating point numbers. From the tests I made it seems that there is a perfect bijection between uniform single-precision floats in [0,1] and integers in [0,2^24] (I am a bit bothered by the fact that it is not [0,2^24-1] and I am still trying to figure out why, my best guess is that 0 is simply special for floats and 1 to 2^24 all result in floats that have the same exponent). My question is whether the floats generated this way are exactly the floats that can be generated from uniform_real_distribution<float>. You can find my integer <-> float tests below:

void floatIntegerBitsBijectionTest()
{
    uint32 two24 = 1 << 24;
    bool bij24Bits = true;
    float delta = float(1.0) / float(two24);
    float prev = float(0) / float(two24);
    for (uint32 i = 1; i <= two24; ++i)
    {
        float uintMap = float(i) / float(two24);
        if (uintMap - prev != delta || uint32(uintMap*float(two24)) != i)
        {
            std::cout << "No bijection exists between uniform floats in [0,1] and integers in [0,2^24].\n";
            bij24Bits = false;
            break;
        }
        prev = uintMap;
    }
    if(bij24Bits) std::cout << "A bijection exists between uniform floats in [0,1] and integers in [0,2^24].\n";
    std::cout << "\n";

    uint32 two25 = 1 << 25;
    bool bij25Bits = true;
    delta = float(1.0) / float(two25);
    prev = float(0) / float(two25);
    for (uint32 i = 1; i <= two25; ++i)
    {
        float uintMap = float(i) / float(two25);
        if (uintMap - prev != delta || uint32(uintMap*float(two25)) != i)
        {
            std::cout << "No bijection exists between uniform floats in [0,1] and integers in [0,2^25].\n";
            if (i == ((1 << 24) + 1)) std::cout << "The first non-uniformly distributed float corresponds to the integer 2^24+1.\n";

            bij25Bits = false;
            break;
        }
        prev = uintMap;
    }
    if (bij25Bits) std::cout << "A bijection exists between uniform floats in [0,1] and integers in [0,2^25].\n";
    std::cout << "\n";


    bool bij25BitsS = true;
    delta = 1.0f / float(two24);
    prev = float(-two24) / float(two24);
    for (int i = -two24+1; i <= two24; ++i)
    {
        float uintMap = float(i) / float(two24);
        if (uintMap - prev != delta || int(uintMap*float(two24)) != i)
        {
            std::cout << i << " " << uintMap - prev << " " << delta << "\n";
            std::cout << "No bijection exists between uniform floats in [-1,1] and integers in [-2^24,2^24].\n";
            bij25BitsS = false;
            break;
        }
        prev = uintMap;
    }
    if (bij25BitsS) std::cout << "A bijection exists between uniform floats in [-1,1] and integers in [-2^24,2^24].\n";
}

EDIT:

Somewhat relevant:

https://crypto.stackexchange.com/questions/31657/uniformly-distributed-secure-floating-point-numbers-in-0-1

http://xoroshiro.di.unimi.it/random_real.c

https://www.reddit.com/r/programming/comments/29ducz/obtaining_uniform_random_floats_is_trickier_than/

https://lemire.me/blog/2017/02/28/how-many-floating-point-numbers-are-in-the-interval-01/

EDIT 2:

I finally managed to figure out what uniform_real_distribution<float> does at least when used with the mt19937 engine when used with its default template arguments (I am talking about the implementation that comes with VS2017). Sadly, it simply generates a random integer number in [0,2^32-1] casts it to float and then divides it by 2^32. Needless to say this produces non-uniformly distributed floating point numbers. I am guessing, however, that this works for most practical purposes unless one is working close to the precision of the deltas between generated numbers.


Solution

  • I will assume the C++ implementation uses the IEEE-754 32-bit basic binary format for float. In this format, the representable floating-point values in [1, 2] are regularly spaced, at a distance of 2−23.

    Define x with:

    std::uniform_real_distribution<float> x(1, 2);
    

    Then, assuming uniform_real_distribution is well implemented and a proper engine is used, x(engine) - 1 will generate values equal to n / 223 for integers n in [0, 223), with uniform distribution.

    Notes

    I have misgivings about the specification of uniform_real_distribution in C++. It is defined in terms of real arithmetic. The requirement that it return values with constant probability density requires a continuous set of numbers, which the floating-point format does not provide. Additionally, I am not sure how implementations will handle endpoints.

    Since the distribution has been forced to be discrete, one might as well use uniform_int_distribution and multiply the samples by 2−23 (available as numeric_limits<float>::epsilon()). The has the benefit of clarifying the endpoints and easily supporting intervals of [0, 1) or [0, 1], as desired.

    Even if the C++ standard does not use IEEE-754, representable values in [1, 2] should be evenly spaced, due to the description in the C++ standard of floating-point values as represented by some number of digits in a certain base, multiplied by the base raised to some power. For the power zero, the values from 1 to 2 would be spaced according to the value of the least significant digit in the format. As above, that distance would be numeric_limits<float>::epsilon().

    Footnotes

    1 The C++ standard uses legacy term “mantissa,” but the preferred term is “significand.”