Search code examples
rrandommemory-efficient

Efficiently generating discrete random numbers


I want to quickly generate discrete random numbers where I have a known CDF. Essentially, the algorithm is:

  1. Construct the CDF vector (an increasing vector starting at 0 and end at 1) cdf
  2. Generate a uniform(0, 1) random number u
    • If u < cdf[1] choose 1
    • else if u < cdf[2] choose 2
    • else if u < cdf[3] choose 3 *...

Example

First generate an cdf:

cdf = cumsum(runif(10000, 0, 0.1))
cdf = cdf/max(cdf)

Next generate N uniform random numbers:

N = 1000
u = runif(N)

Now sample the value:

##With some experimenting this seemed to be very quick
##However, with N = 100000 we run out of memory
##N = 10^6 would be a reasonable maximum to cope with
colSums(sapply(u, ">", cdf))

Solution

  • How about using cut:

    N <- 1e6
    u <- runif(N)
    system.time(as.numeric(cut(u,cdf)))
       user  system elapsed 
       1.03    0.03    1.07 
    
    head(table(as.numeric(cut(u,cdf))))
    
      1   2   3   4   5   6 
     51  95 165 172 148  75