Search code examples
c++pythonrandommontecarlo

Difference between C++ random number generation and Python


I am trying to translate some python code to C++. What the code does is to run a monte carlo simulation. I thought the results from Python and C++ could be very close, but seems something funny happened.

Here is what I do in Python:

self.__length = 100
self.__monte_carlo_array=np.random.uniform(0.0, 1.0, self.__length)

Here is what I do in C++:

int length = 100;
std::random_device rd;
std::mt19937_64 mt(rd());
std::uniform_real_distribution<double> distribution(0, 1);

for(int i = 0; i < length; i++)
{
    double d = distribution(mt);
    monte_carlo_array[i] = d;
}

I ran above random number generation 100x5 times both in Python and C++, and then do monte carlo simulation with these random numbers.

In monte carlo simulation, I set the threshold as 0.5, thus I can easily verify if the results are uniform distributed.

Here is a conceptual draft what monte carlo simulation does:

for(i = 0; i < length; i++)
{
    if(monte_carlo_array[i] > threshold)    // threshold = 0.5
        monte_carlo_output[i] = 1;
    else
        monte_carlo_output[i] = 0;
}

Since the length of the monte carlo array is 120, I expect to see 60 1s both in Python and C++. I calculate the average number of 1s and found that, although the average number in C++ and Python is around 60, but the trend are highly correlated. Moreover, the average number in Python is always higher than in C++.

distribution chart May I know if this is because I've done something wrong, or it is simply because the difference between random generation mechanisms in C++ and Python?

[edit] Please note that the RNG in Python is also the Mersenne Twister 19937.


Solution

  • I wrote this based on the code posted:

    import numpy as np
    
    length = 1000
    monte_carlo_array=np.random.uniform(0.0, 1.0, length)
    # print monte_carlo_array
    threshold = 0.5
    above = 0
    
    for i in range (0,length):
        if monte_carlo_array[i] > threshold:
            above+=1
    print above
    

    and this in C++:

    #include <random> 
    #include <iostream>
    
    int main()
    {
        const int length = 1000;
        std::random_device rd;
        std::mt19937_64 mt(rd());
        std::uniform_real_distribution<double> distribution(0, 1);
        double threshold = 0.5;
        double monte_carlo_array[length];
    
        for(int i = 0; i < length; i++)
        {
            double d = distribution(mt);
            monte_carlo_array[i] = d;
        }
        int above = 0;
    
        for(int i = 0; i < length; i++)
        {
            if (monte_carlo_array[i] > threshold)
            {
                above++;
            }
        }
        std::cout << above << std::endl;
    }
    

    Five runs each gives:

    Python:
    480
    507
    485
    515
    506
    average:
    498.6
    
    C++:
    499
    484
    531
    509
    509
    average
    506.4
    

    So if anything I'm finding that C++ is higher than python. But I think it's more a case of "random numbers are not uniformly distributed with a small number of samples."

    I changed length to 100000 instead, and still the results vary up and down around 50k:

    Python:
    
    50235
    49752
    50215
    49717
    49974
    
    Average: 
    49978.6
    
    C++:
    
    50085
    50018
    49993
    49779
    49966
    
    Average:
    49968.2
    

    In summary, I don't think it's any huge difference between the random number implementations in C++ and Python when it comes to "how uniform it is around 0.5". But I've not studied statistics very much (and it was many years ago).