Search code examples
pythonscipycurve-fittingstatsmodelslog-likelihood

Curve fitting implemented using Maximum Likelihood Estimator implementations not working


I'm implementing a Maximum Likelihood Estimator for discrete count data for the purpose of curve fitting, using the result of curve_fit as the starting point for minimize. I defined and tried these methods for multiple distributions, but will include just one for simplicity, which is a logseries distribution.

At this point I have also tried the following methods from statsmodels methods:

  1. statsmodels.discrete.discrete_model.fit
  2. statsmodels.discrete.count_model.fit
  3. statsmodels.base.model.GenericLikelihoodModel

Most curve fits tend to run into overflow errors or nans and zeros inside. I will detail these errors on another post

#Import a few packages
import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import minimize
from scipy import stats
from numpy import log
import numpy as np
import matplotlib.pyplot as plt

#Given data
x=np.arange(1, 28, 1)
y=np.array([18899, 10427, 6280, 4281, 2736, 1835, 1158, 746, 467, 328, 201, 129, 65, 69, 39, 21, 15, 10, 3, 3, 1, 1, 1, 1, 1, 1, 1])

#Define a custom distribution
def Logser(x, p): 
    return (-p**x)/(x*log(1-p))

#Doing a least squares curve fit
def lsqfit(x, y):
 cf_result = curve_fit(Logser, x, y, p0=0.7, bounds=(0.5,1), method='trf') 
 return cf_result

param_guess=lsqfit(x,y)[0][0]   
print(param_guess)

#Doing a custom MLE definition, minimized using the scipy minimize function

def MLERegression(param_guess):  
 yhat = Logser(x, param_guess) # predictions based on a parameter value
 sd=1 #initially guessed for fitting a normal distribution error around the regressed curve
# next, we flip the Bayesian question
# compute PDF of observed values normally distributed around mean (yhat)
# with a standard deviation of sd
 negLL = -np.sum( stats.norm.logpdf(y, loc=yhat, scale=sd) ) #log of the probability density function
 return negLL

results = minimize(MLERegression, param_guess, method='L-BFGS-B', bounds=(0.5,1.0), options={'disp': True})
final_param=results['x']
print(final_param)

I've constrained the optimizer to give me results similar to what I expect,(a parameter value around 0.8 or 0.9).. The algorithm outputs zero otherwise


Solution

  • I think this is due to scaling. When I change the equation to "scale * (-p**X)/(X * log(1-p))" by adding a scaling factor, I get the following values without using any bounds: p = 9.0360470735534726E-01 and scale = 5.1189277041342692E+04 that yield the following: plot

    and my fitted value for p is indeed 0.9.