Search code examples
pythonrandomcpython

How to generate a repeatable random number sequence?


I would like a function that can generate a pseudo-random sequence of values, but for that sequence to be repeatable every run. The data I want has to be reasonably well randomly distributed over a given range, it doesn't have to be perfect.

I want to write some code which will have performance tests run on it, based on random data. I would like that data to be the same for every test run, on every machine, but I don't want to have to ship the random data with the tests for storage reasons (it might end up being many megabytes).

The library for the random module doesn't appear to say that the same seed will always give the same sequence on any machine.

EDIT: If you're going to suggest I seed the data (as I said above), please provide the documentation that says the approach valid, and will work on a range of machines/implementations.

EDIT: CPython 2.7.1 and PyPy 1.7 on Mac OS X and CPython 2.7.1 and CPython 2.52=.2 Ubuntu appear to give the same results. Still, no docs that stipulate this in black and white.

Any ideas?


Solution

  • The documentation does not explicitly say that providing a seed will always guarantee the same results, but that is guaranteed with Python's implementation of random based on the algorithm that is used.

    According to the documentation, Python uses the Mersenne Twister as the core generator. Once this algorithm is seeded it does not get any external output which would change subsequent calls, so give it the same seed and you will get the same results.

    Of course you can also observe this by setting a seed and generating large lists of random numbers and verifying that they are the same, but I understand not wanting to trust that alone.

    I have not checked that other Python implementations besides CPython but I highly doubt they would implement the random module using an entirely different algorithm.