I am trying to translate some python code to C++. What the code does is to run a monte carlo simulation. I thought the results from Python and C++ could be very close, but seems something funny happened.
Here is what I do in Python:
self.__length = 100
self.__monte_carlo_array=np.random.uniform(0.0, 1.0, self.__length)
Here is what I do in C++:
int length = 100;
std::random_device rd;
std::mt19937_64 mt(rd());
std::uniform_real_distribution<double> distribution(0, 1);
for(int i = 0; i < length; i++)
{
double d = distribution(mt);
monte_carlo_array[i] = d;
}
I ran above random number generation 100x5 times both in Python and C++, and then do monte carlo simulation with these random numbers.
In monte carlo simulation, I set the threshold as 0.5, thus I can easily verify if the results are uniform distributed.
Here is a conceptual draft what monte carlo simulation does:
for(i = 0; i < length; i++)
{
if(monte_carlo_array[i] > threshold) // threshold = 0.5
monte_carlo_output[i] = 1;
else
monte_carlo_output[i] = 0;
}
Since the length of the monte carlo array is 120, I expect to see 60 1
s both in Python and C++. I calculate the average number of 1
s and found that, although the average number in C++ and Python is around 60, but the trend are highly correlated. Moreover, the average number in Python is always higher than in C++.
May I know if this is because I've done something wrong, or it is simply because the difference between random generation mechanisms in C++ and Python?
[edit] Please note that the RNG in Python is also the Mersenne Twister 19937.
I wrote this based on the code posted:
import numpy as np
length = 1000
monte_carlo_array=np.random.uniform(0.0, 1.0, length)
# print monte_carlo_array
threshold = 0.5
above = 0
for i in range (0,length):
if monte_carlo_array[i] > threshold:
above+=1
print above
and this in C++:
#include <random>
#include <iostream>
int main()
{
const int length = 1000;
std::random_device rd;
std::mt19937_64 mt(rd());
std::uniform_real_distribution<double> distribution(0, 1);
double threshold = 0.5;
double monte_carlo_array[length];
for(int i = 0; i < length; i++)
{
double d = distribution(mt);
monte_carlo_array[i] = d;
}
int above = 0;
for(int i = 0; i < length; i++)
{
if (monte_carlo_array[i] > threshold)
{
above++;
}
}
std::cout << above << std::endl;
}
Five runs each gives:
Python:
480
507
485
515
506
average:
498.6
C++:
499
484
531
509
509
average
506.4
So if anything I'm finding that C++ is higher than python. But I think it's more a case of "random numbers are not uniformly distributed with a small number of samples."
I changed length to 100000 instead, and still the results vary up and down around 50k:
Python:
50235
49752
50215
49717
49974
Average:
49978.6
C++:
50085
50018
49993
49779
49966
Average:
49968.2
In summary, I don't think it's any huge difference between the random number implementations in C++ and Python when it comes to "how uniform it is around 0.5". But I've not studied statistics very much (and it was many years ago).