Search code examples
c++mathrandomgenerator

Random generators with multiple (uncorrelated?) distributions in c++


Having read the following questions:

using one random engine for multi distributions in c++11

Uncorrelated parallel random seeds with C++ 2011?

std::default_random_engine generate values between 0.0 and 1.0

how to generate uncorrelated random sequences using c++

Using same random number generator across multiple functions

and having experienced a few tricks has rised doubts in my conceptual understanding on random generators for multiple (different) distributions in c++. In particular:

  • Is it OK to use one generator for drawing numbers in different distributions (uniform, binomial, ...) as long as you don't multithread?

For instance, assume i'm using the following:

class Zsim {
    private:
     std::default_random_engine engine;
}

and initializing it in the constructor:

Zsim::Zsim(...)
{
    std::random_device rd;
    std::default_random_engine generator(rd());
    engine = generator;
}

and using it to draw n values (n possibly large) in different distributions (binomial and uniform), let say:

std::binomial_distribution<int> B_distribution(9, 0.5);
int number = B_distribution(engine);

std::uniform_real_distribution<double> R_distribution(0, 15);
position.x = R_distribution(engine);
position.y = R_distribution(engine);

is this considered OK?

Some pointed out that using std::random_device is nice while others suggested it can throw for a number of reasons and should be avoided or try/catched (see: Using same random number generator across multiple functions).

  • In using one random engine for multi distributions in c++11, it was suggested that, when simulating a random or brownian motion in n-dimensions (n=2 in the example given by MosteM), you need one generator per dimension, otherwise they become correlated, producing an artificial drift. While I agree with this assertion, what is the validity of this assertion given the (huge) period of the generator? If the simulation is large (high number of steps)? Should we always use one generator per dimension as a security? It appears to be in contradiction with the lead reply in how to generate uncorrelated random sequences using c++

  • Finally, given Zsim example, when you add a const qualifier to a method and draw for the binomial distribution:

    int Zscim::get_randomB() const
    { 
        std::binomial_distribution<int> B_distribution(9, 0.5); 
        int number = B_distribution(engine);
     }
    

the compiler throws an error: expression having type 'const std::tr1::default_random_engine' would lose some const-volatile qualifiers in order to call 'unsigned long std::tr1::mersenne_twister<_Ty,_Wx,_Nx,_Mx,_Rx,_Px,_Ux,_Sx,_Bx,_Tx,_Cx,_Lx>::operator()(void)

Suggesting that the generator 'engine' is altered in some way when calling the distribution. What is causing this?


Solution

  • If you read about UniformRandomBitGenerator you will find that the random generator will generate random bits which ideally are pretty much independent from one another, to the extent the PRNG in question can achieve this. So essentially every call to engine() will generate one almost uncorrelated integer. It's the task of the distribution to make the appropriate number of calls to this. A single bit distribution might make a single call to a 32bit engine for every 32 calls to the distribution itself, caching unused entropy between calls. Conversely a double precision number generator might use entropy from two 32bit engine results to determine all 53 mantissa bits of a double. The engine doesn't care which distribution consumes its random bits, so using the same engine in different distributions isn't a problem.

    If you read https://en.wikipedia.org/wiki/Mersenne_Twister you will find that it is

    k-distributed to 32-bit accuracy for every 1 ≤ k ≤ 623 (for a definition of k-distributed, see below)

    So if you use std::mt19937 I'd say you should be safe to use the same engine in up to 623 different distributions, no matter whether they are of the same or different types. For more distributions it depends on how they are to be used, but in most cases I wouldn't worry too much either.