Speed of random sampling from distribution in R

I was trying to investigate a probability distribution whose moments were the Catalan numbers, and came up with

qcatmo <- function(p, k=4){ (qbeta(p/2+1/2, 3/2, 3/2)*2 - 1)^2 * k } 
colMeans(outer(qcatmo(ppoints(10^6)), 0:10, "^"))
#      1     1     2     5    14    42   132   429  1430  4862 16796

which worked out nicely. But then I tried to generate random values from this distribution, and found three possible approaches (A using the quantile function I already knew worked applied to runif, B slightly more direct using the built-in rbeta function, and C using a form of rejection sampling with runif) with noticeably different speeds when used on large samples:

rcatmoA <- function(n, k=4){ qcatmo(runif(n), k) }
rcatmoB <- function(n, k=4){ (rbeta(n, 3/2, 3/2)*2 - 1)^2 * k }
rcatmoC <- function(n, k=4){
             n0 <- ceiling(n*4/pi + 7*sqrt(n) + 35)
             x0 <- runif(n0)^2 
             y0 <- runif(n0)^2 
             x0[x0 + y0 < 1][1:n] * k
             }

which when benchmarked gave

library(microbenchmark)
n <- 10^4
microbenchmark(
  rcatmoA(n,4),
  rcatmoB(n,4),
  rcatmoC(n,4)
  ) 
#Unit: microseconds
#          expr     min       lq      mean   median       uq     max neval cld
# rcatmoA(n, 4) 22817.2 23014.95 23259.688 23186.95 23322.80 25128.9   100   c
# rcatmoB(n, 4)  1526.5  1534.40  1615.255  1541.30  1607.15  4952.1   100  b 
# rcatmoC(n, 4)   781.5   788.70   884.339   795.00   813.80  7266.2   100 a

My questions are:

Why is the B version so much faster than the A version?
If the B version is faster because it avoids applying a function to runif data, why is the C version even faster?
Is there any general advice on how best to make random samples in this sort of situation?

Solution

As you have found out, there are different ways to perform a random variate generation of the same distribution (the beta distribution in this case).

I presume the A version is slowest because the beta distribution's quantile has no closed form, so that qbeta must resort to numerical inversion (made all the more difficult because the beta distribution's CDF, the regularized beta function, is generally known only as an integral). Indeed, as the source code for qbeta shows, the quantile function is far from trivial. Look at the R source code.
The source code for rbeta, which the B version relies on, uses the beta distribution algorithms by Cheng (1978). In this particular case, Algorithm BB is used, which employs a rejection sampler (and also takes at least two logarithms per iteration). In contrast, the C version employs a simpler rejection condition. Cheng's algorithms are one of numerous algorithms that have been proposed over the years for the beta distribution, and algorithms from 1986 and earlier are mentioned in Non-Uniform Random Variate Generation by L. Devroye.