Search code examples
pythonscipydistributionbetabinomial-theorem

Finding alpha and beta of beta-binomial distribution with scipy.optimize and loglikelihood


A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters α > 0 and β > 0. The shape parameters define the probability of success. I want to find the values for α and β that best describe my data from the perspective of a beta-binomial distribution. My dataset players consist of data about the number of hits (H), the number of at-bats (AB) and the conversion (H / AB) of a lot of baseball players. I estimate the PDF with the help of the answer of JulienD in Beta Binomial Function in Python

from scipy.special import beta
from scipy.misc import comb

pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)

Next, I write a loglikelihood function that we will minimize.

def loglike_betabinom(params, *args):
   """
   Negative log likelihood function for betabinomial distribution
   :param params: list for parameters to be fitted.
   :param args:  2-element array containing the sample data.
   :return: negative log-likelihood to be minimized.
   """

   a, b = params[0], params[1]
   k = args[0] # the conversion rate
   n = args[1] # the number of at-bats (AE)

   pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)

   return -1 * np.log(pdf).sum()   

Now, I want to write a function that minimizes loglike_betabinom

 from scipy.optimize import minimize
 init_params = [1, 10]
 res = minimize(loglike_betabinom, x0=init_params,
                args=(players['H'] / players['AB'], players['AB']),
                bounds=bounds,
                method='L-BFGS-B',
                options={'disp': True, 'maxiter': 250})
 print(res.x)

The result is [-6.04544138 2.03984464], which implies that α is negative which is not possible. I based my script on the following R-snippet. They get [101.359, 287.318]..

 ll <- function(alpha, beta) { 
    x <- career_filtered$H
    total <- career_filtered$AB
    -sum(VGAM::dbetabinom.ab(x, total, alpha, beta, log=True))
 }

 m <- mle(ll, start = list(alpha = 1, beta = 10), 
 method = "L-BFGS-B", lower = c(0.0001, 0.1))

 ab <- coef(m)

Can someone tell me what I am doing wrong? Help is much appreciated!!


Solution

  • One thing to pay attention to is that comb(n, k) in your log-likelihood might not be well-behaved numerically for the values of n and k in your dataset. You can verify this by applying comb to your data and see if infs appear.

    One way to amend things could be to rewrite the negative log-likelihood as suggested in https://stackoverflow.com/a/32355701/4240413, i.e. as a function of logarithms of Gamma functions as in

    from scipy.special import gammaln
    import numpy as np
    
    def loglike_betabinom(params, *args):
    
        a, b = params[0], params[1]
        k = args[0] # the OVERALL conversions
        n = args[1] # the number of at-bats (AE)
    
        logpdf = gammaln(n+1) + gammaln(k+a) + gammaln(n-k+b) + gammaln(a+b) - \
         (gammaln(k+1) + gammaln(n-k+1) + gammaln(a) + gammaln(b) + gammaln(n+a+b))
    
        return -np.sum(logpdf) 
    

    You can then minimize the log-likelihood with

    from scipy.optimize import minimize
    
    init_params = [1, 10]
    # note that I am putting 'H' in the args
    res = minimize(loglike_betabinom, x0=init_params,
                args=(players['H'], players['AB']),
                method='L-BFGS-B', options={'disp': True, 'maxiter': 250})
    print(res)
    

    and that should give reasonable results.

    You could check How to properly fit a beta distribution in python? for inspiration if you want to rework further your code.