Search code examples
rperformanceloopsrcpprandom-walk

Rapidly generating ~ 10^9 steps of a random process in R


I have a following task to perform:

Generate 10^9 steps of the process described by the formula:

X(0)=0
X(t+1)=X(t)+Y(t)

where Y(t) are independent random variables with the distribution N(0,1). Calculate in what percentage of indices t the value of X(t) was negative.

I tried the following code:

  x<-c(0,0)
  z<-0
  loop<-10^9
  for(i in 2:loop) {
    x[1]<-x[2]
    x[2]<-x[1]+rnorm(1, 0, 1)
    if (x[2]<0) {z<-z+1}
  }

However, it is very slow. How can I speed it up?


Solution

  • One solution is to go with the vectorized proposed by @G5W, but break it into smaller chunks to avoid any memory overflow issues. This gives you the speed of the vectorized solution, but by managing the chunk size you can control how much memory the process uses.

    The following breaks the problem into blocks of 1e+07, and by looping 100 times you get a total of 1e+09.

    At the end of the first block, you record the percent of time below 0, and the ending point. The ending point is then fed to the next block, and you record the percent of time below 0, and the new ending point.

    At the end, average the 100 runs to get the total amount of time below zero. The calls to cat in the while loop are to monitor progress and see the progression, this can be commented out.

    funky <- function(start, length = 1e+07) {
      Y <- rnorm(length)
      Z <- cumsum(Y)
      c(sum(Z<(-start))/length, (tail(Z, 1) + start))
    }
    
    starttime <- Sys.time()
    resvect <- vector(mode = "numeric", length = 100)
    result <- funky(0)
    resvect[1] <- result[1]
    i <- 2
    while (i < 101) {
      cat(result, "\n")
      result <- funky(result[2])
      resvect[i] <- result[1]
      i <- i + 1
    }
    mean(resvect)
    # [1] 0.1880392
    endtime <- Sys.time()
    elapsed <- endtime - starttime
    elapsed
    # Time difference of 1.207566 mins