Search code examples
c++random

std::binomial_distribution hangs forever with certain inputs


I have encountered some odd behavior with std::binomial_distribution when compiling with clang++ (with libstdc++ standard library).

Consider the following simple program:

#include <ctime>
#include <random>
#include <iostream>

unsigned binom(unsigned n, double p) {
  std::mt19937 e(time(NULL));
  std::binomial_distribution<unsigned> b(n, p);
  return b(e);
}

int main() {

  std::cout << "sample1=" << binom(1073741823, 0.51174692866744709) << "\n";
  std::cout << "sample2=" << binom(1073741824, 0.51174692866744709) << "\n";

}

This program will output one line (sample1=511766586\n) and then hang indefinitely.

Have I somehow invoked undefined behavior? This appears to happen regardless of what the PRNG returns. No matter how I seed it my main hangs on this second line.


Solution

  • I debugged the GCC implementation of binomial_distribution (_M_initialize, operator()), and this is what I found:

    Because of an overflow of the unsigned parameter n (2^30) the variable _M_s2 of the __param object becomes inf and therefore the same happens to __s2s, _M_s of the same object and __u, __a12, __y in operator()

    This leads to the following infinite loop in operator()

    bool __reject;
    do
    {
        if (__u <= __a1)                        inf <= val --> false
        {
            [...]
        }
        else if (__u <= __a12)                  inf <= inf --> true
        {
            __reject = __y >= [...];            inf >= val --> true
        }
        __reject = __reject || [...];           true || bool --> true
        __reject |= [...];                      true | val --> truthy
    }
    while(__reject);
    

    Here is a (partial) traceback of how those variables got to equal inf:

    _M_t = n
    
    _M_s2 = [...] * (1 + [double] / (4 * _M_t * [...]));
                                     ^^^^^^^^ overflow == 0
                         [double] / 0 == inf
    
    __s2s = _M_s2 * _Ms2;
    
    _M_s = [...] + __s2s * [...];
    
    
    __u = __param._M_s * [...];
    
    __a12 = [...] + __param._M_s2 * [...];
    
    __y = __param._M_s2 * [...];
    

    Is also worth noting that __d2x in the __param object is NaN and that the other variables that contribute to this process, of which I omitted the definition, have (at least in this context) valid values

    A feasible solution (until the bug is fixed) would be using std::binomial_distribution<unsigned long> (or uint64_t) in place of unsigned


    Update: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114359