I was trying to investigate a probability distribution whose moments were the Catalan numbers, and came up with
qcatmo <- function(p, k=4){ (qbeta(p/2+1/2, 3/2, 3/2)*2 - 1)^2 * k }
colMeans(outer(qcatmo(ppoints(10^6)), 0:10, "^"))
# 1 1 2 5 14 42 132 429 1430 4862 16796
which worked out nicely. But then I tried to generate random values from this distribution, and found three possible approaches (A using the quantile function I already knew worked applied to runif
, B slightly more direct using the built-in rbeta
function, and C using a form of rejection sampling with runif
) with noticeably different speeds when used on large samples:
rcatmoA <- function(n, k=4){ qcatmo(runif(n), k) }
rcatmoB <- function(n, k=4){ (rbeta(n, 3/2, 3/2)*2 - 1)^2 * k }
rcatmoC <- function(n, k=4){
n0 <- ceiling(n*4/pi + 7*sqrt(n) + 35)
x0 <- runif(n0)^2
y0 <- runif(n0)^2
x0[x0 + y0 < 1][1:n] * k
}
which when benchmarked gave
library(microbenchmark)
n <- 10^4
microbenchmark(
rcatmoA(n,4),
rcatmoB(n,4),
rcatmoC(n,4)
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# rcatmoA(n, 4) 22817.2 23014.95 23259.688 23186.95 23322.80 25128.9 100 c
# rcatmoB(n, 4) 1526.5 1534.40 1615.255 1541.30 1607.15 4952.1 100 b
# rcatmoC(n, 4) 781.5 788.70 884.339 795.00 813.80 7266.2 100 a
My questions are:
runif
data, why is the C version even faster?As you have found out, there are different ways to perform a random variate generation of the same distribution (the beta distribution in this case).
qbeta
must resort to numerical inversion (made all the more difficult because the beta distribution's CDF, the regularized beta function, is generally known only as an integral). Indeed, as the source code for qbeta
shows, the quantile function is far from trivial. Look at the R source code.rbeta
, which the B version relies on, uses the beta distribution algorithms by Cheng (1978). In this particular case, Algorithm BB is used, which employs a rejection sampler (and also takes at least two logarithms per iteration). In contrast, the C version employs a simpler rejection condition. Cheng's algorithms are one of numerous algorithms that have been proposed over the years for the beta distribution, and algorithms from 1986 and earlier are mentioned in Non-Uniform Random Variate Generation by L. Devroye.