I would like to fit a multiple linear regression. It will have a couple input parameters and no intercept.
Is it possible to fit a glm fit with statsmodels then use that fit to simulate a distribution of predicted values?
In the next, upcoming 0.14 release, there is a get_distribution
method based on predicted distribution parameters that return an instance of a scipy or scipy compatible distribution class. The methods of those include rvs
for simulating data. (excluding Tweedie family for which no distribution class exists yet)
Some of this is already available in earlier version.
For some distributions families like gaussian, poisson, binomial the parameters are directly available. For other distributions like negative binomial or gamma there is a conversion from GLM parameterization to standard distribution parameterization.
For example for gaussian family, the dependent variable is normally distributed with mean given by results.fittedvalues
or results.predict(x)
and variance given by results.scale
using from scipy import stats
scale_n = scale / var_weights
return stats.norm(loc=mu, scale=np.sqrt(scale_n))
https://github.com/statsmodels/statsmodels/blob/main/statsmodels/genmod/families/family.py#L675
Note:
scale
in scipy.stats distributions corresponds to the standard deviation, while it corresponds to the variance in GLM and OLS.
var_weights
are like weights
in WLS for the case when variance differs across observations (heteroscedasticity). By default it is just ones like in OLS and can be ignored. (var_weights
are currently only implemented for GLM.)
For non-gaussian families, var_weights
is a multiplicative factor for the inherent heteroscedasticity where variance is a function of the mean.