Search code examples
pythonrscipybetaparameterization

alpha and beta estimates for beta binomial and beta distributions


I am trying to fit my data to a beta-binomial distribution and estimate the alpha and beta shape parameters. For this distribution, the prior is taken from a beta distribution. Python does not have a fit function for beta-binomial but it does for beta. The python beta fitting and R beta binomial fitting is close but systematically off.

R:

library("VGAM")
x = c(222,909,918,814,970,346,746,419,610,737,201,865,573,188,450,229,629,708,250,508)
y = c(2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17)
fit=vglm(cbind(y, x) ~ 1, betabinomialff, trace = TRUE)
Coef(fit)
   shape1    shape2 
  1.736093 26.870768

python:

import scipy.stats
import numpy as np
x = np.array([222,909,918,814,970,346,746,419,610,737,201,865,573,188,450,229,629,708,250,508], dtype=float)
y = np.array([2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17])
scipy.stats.beta.fit((y)/(x+y), floc=0, fscale=1)
    (1.5806623978910086, 24.031893492546242, 0, 1)

I have done this many times and it seems like python is systematically a little bit lower than the R results. I was wondering if this is an input error on my part or just a difference in the way they are calculated?


Solution

  • Your problem is that fitting a beta-binomial model is just not the same as fitting a Beta model with the values equal to the ratios. I'm going to illustrate here with the bbmle package, which will fit similar models to VGAM (but with which I'm more familiar).

    Preliminaries:

    library("VGAM")  ## for dbetabinom.ab
    x <- c(222,909,918,814,970,346,746,419,610,737,
           201,865,573,188,450,229,629,708,250,508)
    y <- c(2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17)
    
    library("bbmle")
    

    Fit beta-binomial model:

    mle2(y~dbetabinom.ab(size=x+y,shape1,shape2),
         data=data.frame(x,y),
         start=list(shape1=2,shape2=30))
    ## Coefficients:
    ##    shape1    shape2 
    ##  1.736046 26.871526 
    

    This agrees more or less perfectly with the VGAM result you quote.

    Now use the same framework to fit a Beta model instead:

    mle2(y/(x+y) ~ dbeta(shape1,shape2),
         data=data.frame(x,y),
         start=list(shape1=2,shape2=30))
    ## Coefficients:
    ##    shape1    shape2 
    ## 1.582021 24.060570 
    

    This fits your Python, beta-fit result. (I'm sure if you used VGAM to fit the Beta you'd get the same answer too.)