I am currently working on importance sampling, and for testing purposes I need to be able to generate all possible values that uniform_real_distribution<float>
may generate for the interval [0,1] (yes it is closed from the right too). My idea was to generate integer numbers which I can then convert to floating point numbers. From the tests I made it seems that there is a perfect bijection between uniform single-precision floats in [0,1] and integers in [0,2^24] (I am a bit bothered by the fact that it is not [0,2^24-1] and I am still trying to figure out why, my best guess is that 0 is simply special for floats and 1 to 2^24 all result in floats that have the same exponent). My question is whether the floats generated this way are exactly the floats that can be generated from uniform_real_distribution<float>
. You can find my integer <-> float tests below:
void floatIntegerBitsBijectionTest()
{
uint32 two24 = 1 << 24;
bool bij24Bits = true;
float delta = float(1.0) / float(two24);
float prev = float(0) / float(two24);
for (uint32 i = 1; i <= two24; ++i)
{
float uintMap = float(i) / float(two24);
if (uintMap - prev != delta || uint32(uintMap*float(two24)) != i)
{
std::cout << "No bijection exists between uniform floats in [0,1] and integers in [0,2^24].\n";
bij24Bits = false;
break;
}
prev = uintMap;
}
if(bij24Bits) std::cout << "A bijection exists between uniform floats in [0,1] and integers in [0,2^24].\n";
std::cout << "\n";
uint32 two25 = 1 << 25;
bool bij25Bits = true;
delta = float(1.0) / float(two25);
prev = float(0) / float(two25);
for (uint32 i = 1; i <= two25; ++i)
{
float uintMap = float(i) / float(two25);
if (uintMap - prev != delta || uint32(uintMap*float(two25)) != i)
{
std::cout << "No bijection exists between uniform floats in [0,1] and integers in [0,2^25].\n";
if (i == ((1 << 24) + 1)) std::cout << "The first non-uniformly distributed float corresponds to the integer 2^24+1.\n";
bij25Bits = false;
break;
}
prev = uintMap;
}
if (bij25Bits) std::cout << "A bijection exists between uniform floats in [0,1] and integers in [0,2^25].\n";
std::cout << "\n";
bool bij25BitsS = true;
delta = 1.0f / float(two24);
prev = float(-two24) / float(two24);
for (int i = -two24+1; i <= two24; ++i)
{
float uintMap = float(i) / float(two24);
if (uintMap - prev != delta || int(uintMap*float(two24)) != i)
{
std::cout << i << " " << uintMap - prev << " " << delta << "\n";
std::cout << "No bijection exists between uniform floats in [-1,1] and integers in [-2^24,2^24].\n";
bij25BitsS = false;
break;
}
prev = uintMap;
}
if (bij25BitsS) std::cout << "A bijection exists between uniform floats in [-1,1] and integers in [-2^24,2^24].\n";
}
EDIT:
Somewhat relevant:
http://xoroshiro.di.unimi.it/random_real.c
https://lemire.me/blog/2017/02/28/how-many-floating-point-numbers-are-in-the-interval-01/
EDIT 2:
I finally managed to figure out what uniform_real_distribution<float>
does at least when used with the mt19937
engine when used with its default template arguments (I am talking about the implementation that comes with VS2017). Sadly, it simply generates a random integer number in [0,2^32-1] casts it to float and then divides it by 2^32. Needless to say this produces non-uniformly distributed floating point numbers. I am guessing, however, that this works for most practical purposes unless one is working close to the precision of the deltas between generated numbers.
I will assume the C++ implementation uses the IEEE-754 32-bit basic binary format for float
. In this format, the representable floating-point values in [1, 2] are regularly spaced, at a distance of 2−23.
Define x
with:
std::uniform_real_distribution<float> x(1, 2);
Then, assuming uniform_real_distribution
is well implemented and a proper engine is used, x(engine) - 1
will generate values equal to n / 223 for integers n in [0, 223), with uniform distribution.
I have misgivings about the specification of uniform_real_distribution
in C++. It is defined in terms of real arithmetic. The requirement that it return values with constant probability density requires a continuous set of numbers, which the floating-point format does not provide. Additionally, I am not sure how implementations will handle endpoints.
Since the distribution has been forced to be discrete, one might as well use uniform_int_distribution
and multiply the samples by 2−23 (available as numeric_limits<float>::epsilon()
). The has the benefit of clarifying the endpoints and easily supporting intervals of [0, 1) or [0, 1], as desired.
Even if the C++ standard does not use IEEE-754, representable values in [1, 2] should be evenly spaced, due to the description in the C++ standard of floating-point values as represented by some number of digits in a certain base, multiplied by the base raised to some power. For the power zero, the values from 1 to 2 would be spaced according to the value of the least significant digit in the format. As above, that distance would be numeric_limits<float>::epsilon()
.
1 The C++ standard uses legacy term “mantissa,” but the preferred term is “significand.”