c++performance c++11 parallel-processing armadillo

C++11 parallelization: bottleneck in Armadillo's set_seed_random()

In C++11, the use of arma_rng::set_seed_random() generates a bottleneck. I show a way to reproduce it.

Consider this simple code:

#include <armadillo>    // Load Armadillo library.
using namespace arma;

int main()
{
bool jj = true;
while ( jj == true ){
    arma_rng::set_seed_random();            // Set the seed to generate random numbers.
    double rnd_number = randu<double>();    // Generate a random number.
}

}

I compiled it with

g++ -std=c++11 -Wall -g bayesian_estimation.cpp -o bayesian_estimation -O2 -larmadillo

When I run the executable in a terminal, I see that one of the cores is handling it with a CPU% close to 100%. If I run more instances of it, the CPU% of each corresponding process is reduced, but no new (and idle!) cores are used. I illustrate this kind of behavior in detail in this question.

Why is this happening?

Solution

I would assume that Armadillo takes the seed for set_seed_random() from the pool of true random numbers that is maintained by the OS (e.g. /dev/random on most *NIX OS). Since this needs a physical source of entropy (usually, the timing of keystrokes, network events, other interrupt sources is used), this pool is finite and can be exhausted faster than new random numbers can be generated.

And in your case, I would assume that one executable running at full speed is depleting the pool at roughly the same rate that new entropy is added. As soon as you add a second, third, ..., they stall while waiting for new random numbers to enter the pool.