I am trying to reproduce some C++ code in Python that involves random number generations. The C++ code uses the MT19937 generator as follows:
#include <random>
#include <iostream>
int main() {
std::mt19937 generator(1234);
std::uniform_real_distribution<double> distribution(0.0, 1.0);
for (int i = 0; i < 10; ++i) {
std::cout << distribution(generator) << std::endl;
}
return 0;
}
The Python version is (with NumPy 1.23.3)
import numpy as np
rng = np.random.Generator(np.random.MT19937(1234))
for _ in range(10):
print(rng.random())
In both cases, the random seed is set to 1234
. But the two produce different outputs on my machine (macOS 14.0 ARM). The C++ code outputs
0.497664
0.817838
0.612112
0.77136
0.86067
0.150637
0.198519
0.815163
0.158815
0.116138
while the Python code outputs
0.12038356302504949
0.4037014194964441
0.8777026256367374
0.9565788014497463
0.42646002242298486
0.28304326113156464
0.9009410688498408
0.830833142531224
0.6752899264264728
0.3977176012599666
Why do the two MT19937 generators produce different sequences despite the same seed? How (if possible) can I make them the same?
The C++ standard library version of the mersenne twister engine is different from all other versions in how the state is initialized from an integer seed.
You can either use another mt19937
library in C++ that matches the numpy method (which I think is the way that most people implement mt19937), or you can change how your Python engine is seeded to match:
import numpy as np
import numpy.random
WORD_SIZE = 32 # The template argument after the type
STATE_SIZE = 624 # The next template argument (Also `len(np.random.MT19937().state['state']['key'])`)
INITIALIZATION_MULTIPLIER = 1812433253 # The last template argument
DEFAULT_SEED = 5489 # A constant
def cpp_seed_mt19937(seed = DEFAULT_SEED):
state = np.zeros(STATE_SIZE, dtype=np.uint32)
state[0] = seed
for j in range(1, STATE_SIZE):
state[j] = INITIALIZATION_MULTIPLIER * (state[j-1] ^ (state[j-1] >> (WORD_SIZE - 2))) + j
result = np.random.MT19937()
result.state = {'bit_generator': 'MT19937', 'state': {'key': state, 'pos': STATE_SIZE - 1}}
result.random_raw(1) # Start at index "STATE_SIZE-1" and advance by 1 to advance past the generated state
return result
engine = cpp_seed_mt19937(2)
print(*engine.random_raw(10), sep='\n')
#include <random>
#include <iostream>
int main() {
std::mt19937 e(2);
for (int i = 0; i < 10; ++i)
std::cout << e() << '\n';
}
These both should produce the same output:
1872583848
794921487
111352301
4000937544
2360782358
4070471979
1869695442
2081981515
1805465960
1376693511
Now converting these 32 bit numbers into floating point numbers between 0 and 1 will have different results depending on how the algorithm is implemented. You would have to use some way that is standardised to give the same result given the same random number from the mersenne twister.
(Also, std::uniform_real_distribution<double> distribution(0.0, 1.0);
will not give you the same sequence of numbers on different platforms)
Alternatively, you can have a small portion of C++ code that you call with ctypes
or otherwise that only generates the random numbers for you, and you can have the rest of the code in Python.