Search code examples
pythonc++numpyrandomrandom-seed

MT19937 Generator in C++ and NumPy generate different numbers


I am trying to reproduce some C++ code in Python that involves random number generations. The C++ code uses the MT19937 generator as follows:

#include <random>
#include <iostream>

int main() {
    std::mt19937 generator(1234);
    std::uniform_real_distribution<double> distribution(0.0, 1.0);

    for (int i = 0; i < 10; ++i) {
        std::cout << distribution(generator) << std::endl;
    }

    return 0;
}

The Python version is (with NumPy 1.23.3)

import numpy as np

rng = np.random.Generator(np.random.MT19937(1234))
for _ in range(10):
    print(rng.random())

In both cases, the random seed is set to 1234. But the two produce different outputs on my machine (macOS 14.0 ARM). The C++ code outputs

0.497664
0.817838
0.612112
0.77136
0.86067
0.150637
0.198519
0.815163
0.158815
0.116138

while the Python code outputs

0.12038356302504949
0.4037014194964441
0.8777026256367374
0.9565788014497463
0.42646002242298486
0.28304326113156464
0.9009410688498408
0.830833142531224
0.6752899264264728
0.3977176012599666

Why do the two MT19937 generators produce different sequences despite the same seed? How (if possible) can I make them the same?


Solution

  • The C++ standard library version of the mersenne twister engine is different from all other versions in how the state is initialized from an integer seed.

    You can either use another mt19937 library in C++ that matches the numpy method (which I think is the way that most people implement mt19937), or you can change how your Python engine is seeded to match:

    import numpy as np
    import numpy.random
    
    WORD_SIZE = 32  # The template argument after the type
    STATE_SIZE = 624  # The next template argument (Also `len(np.random.MT19937().state['state']['key'])`)
    INITIALIZATION_MULTIPLIER = 1812433253  # The last template argument
    DEFAULT_SEED = 5489  # A constant
    
    def cpp_seed_mt19937(seed = DEFAULT_SEED):
        state = np.zeros(STATE_SIZE, dtype=np.uint32)
        state[0] = seed
        for j in range(1, STATE_SIZE):
            state[j] = INITIALIZATION_MULTIPLIER * (state[j-1] ^ (state[j-1] >> (WORD_SIZE - 2))) + j
        result = np.random.MT19937()
        result.state = {'bit_generator': 'MT19937', 'state': {'key': state, 'pos': STATE_SIZE - 1}}
        result.random_raw(1)  # Start at index "STATE_SIZE-1" and advance by 1 to advance past the generated state
        return result
        
    engine = cpp_seed_mt19937(2)
    print(*engine.random_raw(10), sep='\n')
    
    #include <random>
    #include <iostream>
    
    int main() {
        std::mt19937 e(2);
        for (int i = 0; i < 10; ++i)
            std::cout << e() << '\n';
    }
    

    These both should produce the same output:

    1872583848
    794921487
    111352301
    4000937544
    2360782358
    4070471979
    1869695442
    2081981515
    1805465960
    1376693511
    

    Now converting these 32 bit numbers into floating point numbers between 0 and 1 will have different results depending on how the algorithm is implemented. You would have to use some way that is standardised to give the same result given the same random number from the mersenne twister.

    (Also, std::uniform_real_distribution<double> distribution(0.0, 1.0); will not give you the same sequence of numbers on different platforms)

    Alternatively, you can have a small portion of C++ code that you call with ctypes or otherwise that only generates the random numbers for you, and you can have the rest of the code in Python.