Search code examples
rrandomstatacode-translation

Replicating seed setting from Stata


I'm trying to replicate in R a bit of code someone else wrote in Stata, and have hit a wall trying to predict the behavior of their p-RNG.

Their code has this snippet:

set seed 123456

Unfortunately, it's a bit nebulous exactly the algorithm used by Stata. This question suggests it's a KISS algorithm, but didn't manage to replicate in the end (and some of the links there seem to be dead/outdated). And the manual from Stata for set seed doesn't mention anything about algorithms. This question as well doesn't seem to have been completed.

Is it a fool's errand to try and replicate Stata's random numbers?

I don't know which version of Stata was used to create this.


Solution

  • In short: Yes, it is a fool's errand.

    Stata, being a proprietary software, hasn't released all of the details of its core components, like its random number generator. However, documentation is available (link for Stata 14), most pertinently:

    runiform() is the basis for all the other random-number functions because all the other random- number functions transform uniform (0, 1) random numbers to the specified distribution.

    runiform() implements the Mersenne Twister 64-bit (MT64) and the “keep it simple stupid” 32-bit (KISS32) algorithms for generating uniform (0, 1) random numbers. runiform() uses the MT64 algorithm by default.

    runiform() uses the KISS32 algorithm only when the user version is less than 14 or when the random-number generator has been set to kiss32...

    Recall also from ?Random in R that for Mersenne twister:

    The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.

    Stata internally controls the 624-dimensional set, which should be nearly impossible to guess.

    I suggest you export these random numbers from Stata and read them into a vector/matrix/etc. in R using

    library(haven)
    mydata <- read_dta("mydata.dta")