python pandas statsmodels non-linear-regression

Statsmodels - Negative Binomial doesn't converge while GLM does converge

I'm trying to do a Negative Binomial regression using Python's statsmodels package. The model estimates fine when using the GLM routine i.e.

model = smf.glm(formula="Sales_Focus_2016 ~ Sales_Focus_2015  + A_Calls + A_Ed", data=df, family=sm.families.NegativeBinomial()).fit()
model.summary()

However, the GLM routine doesn't estimate alpha, the dispersion term. I tried to use the Negative Binomial routine directly (which does estimate alpha) i.e.

nb = smf.negativebinomial(formula="Sales_Focus_2016 ~ Sales_Focus_2015 + A_Calls + A_Ed", data=df).fit()
nb.summary()

But this doesn't converge. Instead I get the message:

Warning: Desired error not necessarily achieved due to precision loss.
     Current function value: nan
     Iterations: 0
     Function evaluations: 1
     Gradient evaluations: 1

My question is:

Do the two routines use different methods of estimation? Is there a way to make the smf.NegativeBinomial routine use the same estimation methods as the GLM routine?

Solution

discrete.NegativeBinomial uses either a newton method (default) in statsmodels or the scipy optimizers. The main problem is that the exponential mean function can easily result in overflow problems or problems from large gradients and hessian when we are still far away from the optimum. There are some attempts in the fit method to get good starting values but this does not always work.

a few possibilities that I usually try

check that no regressor has large values, e.g. rescale to have max below 10
use method='nm' Nelder-Mead as initial optimizer and switch to newton or bfgs after some iterations or after convergence.
try to come up with better starting values (see for example about GLM below)

GLM uses by default iteratively reweighted least squares, IRLS, which is only standard for one parameter families, i.e. it takes the dispersion parameter as given. So the same method cannot be directly used for the full MLE in discrete NegativeBinomial.

GLM negative binomial still specifies the full loglike. So it is possible to do a grid search over the dispersion parameter using GLM.fit() for estimating the mean parameters for each value of the dispersion parameter. This should be equivalent to the corresponding discrete NegativeBinomial version (nb2 ? I don't remember). It could also be used as start_params for the discrete version.

In the statsmodels master version, there is now a connection to allow arbitrary scipy optimizers instead of the ones that were hardcoded. scipy recently obtained trust region newton methods, and will get more in future, which should work for more cases than the simple newton method in statsmodels. (However, most likely that does not work currently for discrete NegativeBinomial, I just found out about a possible problem https://github.com/statsmodels/statsmodels/issues/3747 )