Is 1.0 a valid output from std::generate_canonical?

I always thought random numbers would lie between zero and one, without 1, i.e. they are numbers from the half-open interval [0,1). The documention on cppreference.com of std::generate_canonical confirms this.

However, when I run the following program:

#include <iostream>
#include <limits>
#include <random>

int main()
{
    std::mt19937 rng;

    std::seed_seq sequence{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
    rng.seed(sequence);
    rng.discard(12 * 629143 + 6);

    float random = std::generate_canonical<float,
                   std::numeric_limits<float>::digits>(rng);

    if (random == 1.0f)
    {
        std::cout << "Bug!\n";
    }

    return 0;
}

It gives me the following output:

Bug!

i.e. it generates me a perfect 1, which causes problems in my MC integration. Is that valid behavior or is there an error on my side? This gives the same output with G++ 4.7.3

g++ -std=c++11 test.c && ./a.out

and clang 3.3

clang++ -stdlib=libc++ -std=c++11 test.c && ./a.out

If this is correct behavior, how can I avoid 1?

Edit 1: G++ from git seems to suffer from the same problem. I am on

commit baf369d7a57fb4d0d5897b02549c3517bb8800fd
Date:   Mon Sep 1 08:26:51 2014 +0000

and compiling with ~/temp/prefix/bin/c++ -std=c++11 -Wl,-rpath,/home/cschwan/temp/prefix/lib64 test.c && ./a.out gives the same output, ldd yields

linux-vdso.so.1 (0x00007fff39d0d000)
libstdc++.so.6 => /home/cschwan/temp/prefix/lib64/libstdc++.so.6 (0x00007f123d785000)
libm.so.6 => /lib64/libm.so.6 (0x000000317ea00000)
libgcc_s.so.1 => /home/cschwan/temp/prefix/lib64/libgcc_s.so.1 (0x00007f123d54e000)
libc.so.6 => /lib64/libc.so.6 (0x000000317e600000)
/lib64/ld-linux-x86-64.so.2 (0x000000317e200000)

Edit 2: I reported the behavior here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63176

Edit 3: The clang team seems to be aware of the problem: http://llvm.org/bugs/show_bug.cgi?id=18767

Solution

The problem is in mapping from the codomain of std::mt19937 (std::uint_fast32_t) to float; the algorithm described by the standard gives incorrect results (inconsistent with its description of the output of the algorithm) when loss of precision occurs if the current IEEE754 rounding mode is anything other than round-to-negative-infinity (note that the default is round-to-nearest).

The 7549723rd output of mt19937 with your seed is 4294967257 (0xffffffd9u), which when rounded to 32-bit float gives 0x1p+32, which is equal to the max value of mt19937, 4294967295 (0xffffffffu) when that is also rounded to 32-bit float.

This is recognized as a defect in the standard, tracked as LWG issue 2524. The paper series P0952 (currently P0952R2) proposes fixing the specification by adopting rejection sampling, which would significantly change behavior by altering the number of calls to the underlying URBG.

The standard could ensure correct behavior if it were to specify that when converting from the output of the URNG to the RealType of generate_canonical, rounding is to be performed towards negative infinity; this would give a correct result in this case. As QOI, it would be good for libstdc++ to make this change.

With this change, 1.0 will no longer be generated; instead the boundary values 0x1.fffffep-N for 0 < N <= 8 will be generated more often (approximately 2^(8 - N - 32) per N, depending on the actual distribution of MT19937).

I would recommend to not use float with std::generate_canonical directly; rather generate the number in double and then round towards negative infinity:

    double rd = std::generate_canonical<double,
        std::numeric_limits<float>::digits>(rng);
    float rf = rd;
    if (rf > rd) {
      rf = std::nextafter(rf, -std::numeric_limits<float>::infinity());
    }

This problem can also occur with std::uniform_real_distribution<float>; the solution is the same, to specialize the distribution on double and round the result towards negative infinity in float.