So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand()
plus modulus paradigm.
However, I wanted to see the failings of std::rand()
firsthand so I did a quick experiment:
getRandNum_Old()
and getRandNum_New()
that generated a random number between 0 and 5 inclusive using std::rand()
and std::mt19937
+std::uniform_int_distribution
respectively.Here were the results:
[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301
[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964
Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937
+std::uniform_int_distribution
was not "more uniform" than simple std::rand()
+%
. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.
Is my experiment flawed in some way? Or is std::rand()
really not that bad, and maybe even better?
For reference, here is the code I used in its entirety:
#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>
int getRandNum_Old() {
static bool init = false;
if (!init) {
std::srand(time(nullptr)); // Seed std::rand
init = true;
}
return std::rand() % 6;
}
int getRandNum_New() {
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init) {
eng.seed(rd()); // Seed random engine
init = true;
}
return dist(eng);
}
template <typename T>
double mean(T* data, int n) {
double m = 0;
std::for_each(data, data+n, [&](T x){ m += x; });
m /= n;
return m;
}
template <typename T>
double stdDev(T* data, int n) {
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x){ sd += ((x-m) * (x-m)); });
sd /= n;
sd = sqrt(sd);
return sd;
}
int main() {
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die
/* Do the things the "old" way (blech) */
int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];
for (int j = 0; j < M; j++) {
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++) {
int roll = getRandNum_Old();
freqList_Old[roll] += 1;
}
stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;
}
/* Do the things the cool new way! */
int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];
for (int j = 0; j < M; j++) {
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++) {
int roll = getRandNum_New();
freqList_New[roll] += 1;
}
stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;
}
/* Display Results */
printf("[OLD WAY]\n");
printf("Spread\n");
printf(" mean: %.6f\n", mean(stdDevList_Old, M));
printf(" std dev: %.6f\n", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)\n");
printf(" mean: %.6f\n", mean(timeTakenList_Old, M));
printf(" std dev: %.6f\n", stdDev(timeTakenList_Old, M));
printf("\n");
printf("[NEW WAY]\n");
printf("Spread\n");
printf(" mean: %.6f\n", mean(stdDevList_New, M));
printf(" std dev: %.6f\n", stdDev(stdDevList_New, M));
printf("Time Taken (ms)\n");
printf(" mean: %.6f\n", mean(timeTakenList_New, M));
printf(" std dev: %.6f\n", stdDev(timeTakenList_New, M));
}
Pretty much any implementation of "old" rand()
use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.
Common failings of "bad" - but common enough - rand()
implementations are:
RAND_MAX
;Still, none of these are specific to the API of rand()
. A particular implementation could place a xorshift-family generator behind srand
/rand
and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.
Edit: @R. correctly notes that the rand
/srand
interface is limited by the fact that srand
takes an unsigned int
, so any generator an implementation may put behind them is intrinsically limited to UINT_MAX
possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make srand
take an unsigned long long
, or adding a separate srand(unsigned char *, size_t)
overload.
Indeed, the actual problem with rand()
is not much of implementation in principle but:
RAND_MAX
of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand
with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);simplistic interface; rand()
provides a single generator with the global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:
Finally, the rand
state of affairs:
time(NULL)
is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).Hence the new <random>
header, which tries to fix this mess providing algorithms that are:
... and a default random_device
as well to seed them.
Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint
& Co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).
Finally, to get back to your performance comparison: as others have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use std::minstd_rand
instead of std::mt19937
.
Indeed, after tweaking your function to use std::minstd_rand
and avoid useless static variables for initialization
int getRandNum_New() {
static std::minstd_rand eng{std::random_device{}()};
static std::uniform_int_distribution<int> dist{0, 5};
return dist(eng);
}
I get 9 ms (old) vs 21 ms (new); finally, if I get rid of dist
(which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in getRandNum_Old()
int getRandNum_New() {
static std::minstd_rand eng{std::random_device{}()};
return eng() % 6;
}
I get it down to 6 ms (so, 30% faster), probably because, unlike the call to rand()
, std::minstd_rand
is easier to inline.
Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) XorShift64*
, and it's 2.3 times faster than rand()
(3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.