I am trying to fit my data to a beta-binomial distribution and estimate the alpha and beta shape parameters. For this distribution, the prior is taken from a beta distribution. Python does not have a fit function for beta-binomial but it does for beta. The python beta fitting and R beta binomial fitting is close but systematically off.
R:
library("VGAM")
x = c(222,909,918,814,970,346,746,419,610,737,201,865,573,188,450,229,629,708,250,508)
y = c(2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17)
fit=vglm(cbind(y, x) ~ 1, betabinomialff, trace = TRUE)
Coef(fit)
shape1 shape2
1.736093 26.870768
python:
import scipy.stats
import numpy as np
x = np.array([222,909,918,814,970,346,746,419,610,737,201,865,573,188,450,229,629,708,250,508], dtype=float)
y = np.array([2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17])
scipy.stats.beta.fit((y)/(x+y), floc=0, fscale=1)
(1.5806623978910086, 24.031893492546242, 0, 1)
I have done this many times and it seems like python is systematically a little bit lower than the R results. I was wondering if this is an input error on my part or just a difference in the way they are calculated?
Your problem is that fitting a beta-binomial model is just not the same as fitting a Beta model with the values equal to the ratios. I'm going to illustrate here with the bbmle
package, which will fit similar models to VGAM
(but with which I'm more familiar).
Preliminaries:
library("VGAM") ## for dbetabinom.ab
x <- c(222,909,918,814,970,346,746,419,610,737,
201,865,573,188,450,229,629,708,250,508)
y <- c(2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17)
library("bbmle")
Fit beta-binomial model:
mle2(y~dbetabinom.ab(size=x+y,shape1,shape2),
data=data.frame(x,y),
start=list(shape1=2,shape2=30))
## Coefficients:
## shape1 shape2
## 1.736046 26.871526
This agrees more or less perfectly with the VGAM
result you quote.
Now use the same framework to fit a Beta model instead:
mle2(y/(x+y) ~ dbeta(shape1,shape2),
data=data.frame(x,y),
start=list(shape1=2,shape2=30))
## Coefficients:
## shape1 shape2
## 1.582021 24.060570
This fits your Python, beta-fit result. (I'm sure if you used VGAM
to fit the Beta you'd get the same answer too.)