Search code examples
rstatisticsbayesianbeta-distribution

Compute the Bayes factor of an A/B test dataset in r


I am trying to compute the Bayes factor of an A/B test dataset that can be found here. However, I end up with a NaN because the beta coefficient evaluates to zero. In calculating the likelihoods, I am assuming that it follows the binomial distribution. Hence, I am following this formula:

likelihood = choose(n,k) * Beta(k+1,n-k+1)

The code can be found below

data <- read.csv(file="ab_data.csv", header=TRUE, sep=",")

control <- data[which(data$group == "control"),]
treatment <- data[which(data$group == "treatment"),]

#compute bayes factor 
n1 = nrow(control)
r1 = sum(control$converted)
n2 = nrow(treatment)
r2 = sum(treatment$converted)

likelihood_control <- choose(n1,r1) * beta(r1+1, n1-r1+1)
likelihood_treatment <- choose(n2,r2) * beta(r2+1, n2-r2+1)
bayes_factor <- likelihood_control/ likelihood_treatment
beta(r1+1, n1+r1+1)
beta(r2+1, n2-r2+1)
bayes_factor

Solution

  • As you observed, the problem is that the beta function is returning 0, but this is not because the likelihood is actually 0, it's just that the likelihood is so small the computer is storing it as 0. The second issue is that choose is returning Inf. Again, this is not because the value is actually infinite, it's just that R can't internally store values that large. The solution is to use logarithms, which grow much more slowly, and then exponentiate at the end. Below should work (I tested the logchoose function, and it seems to work)

    logchoose <- function(n, k){
      num <- sum(log(seq(n - k  + 1, n)))
      denom <- sum(log(1:k))
      return(num - denom)
    }
    
    likelihood_control <- logchoose(n1,r1) + lbeta(r1+1, n1-r1+1)
    likelihood_treatment <- logchoose(n2,r2) + lbeta(r2+1, n2-r2+1)
    bayes_factor <- exp(likelihood_control - likelihood_treatment)
    bayes_factor