MT19937 Generator in C++ and NumPy generate different numbers

I am trying to reproduce some C++ code in Python that involves random number generations. The C++ code uses the MT19937 generator as follows:

#include <random>
#include <iostream>

int main() {
    std::mt19937 generator(1234);
    std::uniform_real_distribution<double> distribution(0.0, 1.0);

    for (int i = 0; i < 10; ++i) {
        std::cout << distribution(generator) << std::endl;
    }

    return 0;
}

The Python version is (with NumPy 1.23.3)

import numpy as np

rng = np.random.Generator(np.random.MT19937(1234))
for _ in range(10):
    print(rng.random())

In both cases, the random seed is set to 1234. But the two produce different outputs on my machine (macOS 14.0 ARM). The C++ code outputs

while the Python code outputs

0.12038356302504949
0.4037014194964441
0.8777026256367374
0.9565788014497463
0.42646002242298486
0.28304326113156464
0.9009410688498408
0.830833142531224
0.6752899264264728
0.3977176012599666

Why do the two MT19937 generators produce different sequences despite the same seed? How (if possible) can I make them the same?

Solution

The C++ standard library version of the mersenne twister engine is different from all other versions in how the state is initialized from an integer seed.

You can either use another mt19937 library in C++ that matches the numpy method (which I think is the way that most people implement mt19937), or you can change how your Python engine is seeded to match:

import numpy as np
import numpy.random

WORD_SIZE = 32  # The template argument after the type
STATE_SIZE = 624  # The next template argument (Also `len(np.random.MT19937().state['state']['key'])`)
INITIALIZATION_MULTIPLIER = 1812433253  # The last template argument
DEFAULT_SEED = 5489  # A constant

def cpp_seed_mt19937(seed = DEFAULT_SEED):
    state = np.zeros(STATE_SIZE, dtype=np.uint32)
    state[0] = seed
    for j in range(1, STATE_SIZE):
        state[j] = INITIALIZATION_MULTIPLIER * (state[j-1] ^ (state[j-1] >> (WORD_SIZE - 2))) + j
    result = np.random.MT19937()
    result.state = {'bit_generator': 'MT19937', 'state': {'key': state, 'pos': STATE_SIZE - 1}}
    result.random_raw(1)  # Start at index "STATE_SIZE-1" and advance by 1 to advance past the generated state
    return result
    
engine = cpp_seed_mt19937(2)
print(*engine.random_raw(10), sep='\n')

#include <random>
#include <iostream>

int main() {
    std::mt19937 e(2);
    for (int i = 0; i < 10; ++i)
        std::cout << e() << '\n';
}

These both should produce the same output:

Now converting these 32 bit numbers into floating point numbers between 0 and 1 will have different results depending on how the algorithm is implemented. You would have to use some way that is standardised to give the same result given the same random number from the mersenne twister.

(Also, std::uniform_real_distribution<double> distribution(0.0, 1.0); will not give you the same sequence of numbers on different platforms)

Alternatively, you can have a small portion of C++ code that you call with ctypes or otherwise that only generates the random numbers for you, and you can have the rest of the code in Python.