The following code fits a oversimplified generalized linear model using statsmodels
model = smf.glm('Y ~ 1', family=sm.families.NegativeBinomial(), data=df)
results = model.fit()
This gives the coefficient and a stderr:
coef stderr
Intercept 2.9471 0.120
Now I want to graphically compare the real distribution of the variable Y (histogram) with the distribution that comes from the model.
But I need two parameters r
and p
to evaluate the stats.nbinom(r,p)
and plot it.
Is there a way to retrieve the parameters from the results of the fitting? How can I plot the PMF?
Generalized linear models, GLM, in statsmodels currently does not estimate the extra parameter of the Negative Binomial distribution. Negative Binomial belongs to the exponential family of distributions only for fixed shape parameter.
However, statsmodels also has Negative Binomial as a Maximum Likelihood Model in discrete_model
which estimates all parameters.
The parameterization of the Negative Binomial for count regression is in terms of the mean or expected value, which is different from the parameterization in scipy.stats.nbinom. Actually, there are two different commonly used parameterization for the Negative Binomial count regression, usually called nb1
and nb2
Here is a quickly written script that recovers the scipy.stats.nbinom parameters, n=size
and p=prob
from the estimated parameters. Once you have the parameters for the scipy.stats.distribution you can use all the available method, rvs, pmf, and so on.
Something like this should be made available in statsmodels.
In a few example runs, I got results like this
data generating parameters 50 0.25
estimated params 51.7167511571 0.256814610633
estimated params 50.0985814878 0.249989725917
Aside, because of the underlying exponential reparameterization, the scipy optimizers have sometimes problems to converge. In those cases, either providing better starting values or using Nelder-Mead as optimization method usually helps.
import numpy as np
from scipy import stats
import statsmodels.api as sm
# generate some data to check
nobs = 1000
n, p = 50, 0.25
dist0 = stats.nbinom(n, p)
y = dist0.rvs(size=nobs)
x = np.ones(nobs)
loglike_method = 'nb1' # or use 'nb2'
res = sm.NegativeBinomial(y, x, loglike_method=loglike_method).fit(start_params=[0.1, 0.1])
print dist0.mean()
print res.params
mu = res.predict() # use this for mean if not constant
mu = np.exp(res.params[0]) # shortcut, we just regress on a constant
alpha = res.params[1]
if loglike_method == 'nb1':
Q = 1
elif loglike_method == 'nb2':
Q = 0
size = 1. / alpha * mu**Q
prob = size / (size + mu)
print 'data generating parameters', n, p
print 'estimated params ', size, prob
#estimated distribution
dist_est = stats.nbinom(size, prob)
BTW: I ran into this before but didn't have time to look at it https://github.com/statsmodels/statsmodels/issues/106