Search code examples
c++randomfloating-point

`std::uniform_real_distribution` lacking diversity in exponent


I'm trying to generate random floating point numbers with a rather large dynamics but I'm failing to get variability in the numbers exponents.

My snippet is the following:

#include <cstddef>
#include <iostream>
#include <limits>
#include <random>

using fp_t = float;

// without downscale random generator saturating to inf on full range
constexpr fp_t downscale = 2.;

int main() {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_real_distribution<fp_t> dis(
        std::numeric_limits<fp_t>::lowest() / downscale,
        std::numeric_limits<fp_t>::max() / downscale);
    for (std::size_t i = 0; i != 100; ++i) {
        std::cout << dis(gen) << '\n';
    }
}

And here is a typical output:

7.75099e+37
8.72085e+37
7.23914e+37
-1.21691e+38
-3.26131e+37
-2.25627e+36
2.37054e+37
3.71767e+37
-1.18834e+38
...

LIVE

The exponent stays close to the distribution bounds exponent.

1- Is it a normal behavior for std::uniform_real_distribution?
2- How can it be improved (by giving diversity in orders of magnitude)?

For 2 I'm thinking either to generate mantissa and exponent separately and recombine them thereafter or using integer random generation within the memory location of a floating point variable (but, as far as I know, it's UB, except if I cast the address to char * and set each byte individually).


Solution

  • The set of numbers whose scientific notation has e+37 has a measure 10× greater than the set of numbers whose scientific notation has e+36. A uniform distribution will of course bias towards numbers with "larger exponents", because there is simply a greater probability mass.

    If you want uniformity in the exponents, you don't want a uniform distribution. If the exponents of the random variable were uniform, then the log of the random variable would be uniform:

    std::uniform_real_distribution<fp_t> lg_dis(std::log2(std::numeric_limits<fp_t>::denorm_min()), std::log2(std::numeric_limits<fp_t>::max() / downscale));
    std::uniform_int_distribution<> sgn_dis(0, 1);
    auto dis = [&](auto& gen) {
        fp_t lg = lg_dis(gen);
        auto sgn = sgn_dis(gen);
        return std::exp2(lg) * (sgn ? -1 : +1);
    };
    for (std::size_t i = 0; i != 100; ++i) {
        std::cout << dis(gen) << '\n';
    }