Search code examples
pythonstatisticsstatsmodels

Is it possible to simulate a distribution from a glm fit in statsmodels?


I would like to fit a multiple linear regression. It will have a couple input parameters and no intercept.

Is it possible to fit a glm fit with statsmodels then use that fit to simulate a distribution of predicted values?


Solution

  • In the next, upcoming 0.14 release, there is a get_distribution method based on predicted distribution parameters that return an instance of a scipy or scipy compatible distribution class. The methods of those include rvs for simulating data. (excluding Tweedie family for which no distribution class exists yet)

    Some of this is already available in earlier version.

    For some distributions families like gaussian, poisson, binomial the parameters are directly available. For other distributions like negative binomial or gamma there is a conversion from GLM parameterization to standard distribution parameterization.

    For example for gaussian family, the dependent variable is normally distributed with mean given by results.fittedvalues or results.predict(x) and variance given by results.scale

    using from scipy import stats

        scale_n = scale / var_weights
        return stats.norm(loc=mu, scale=np.sqrt(scale_n))
    

    https://github.com/statsmodels/statsmodels/blob/main/statsmodels/genmod/families/family.py#L675

    Note: scale in scipy.stats distributions corresponds to the standard deviation, while it corresponds to the variance in GLM and OLS.

    var_weights are like weights in WLS for the case when variance differs across observations (heteroscedasticity). By default it is just ones like in OLS and can be ignored. (var_weights are currently only implemented for GLM.)
    For non-gaussian families, var_weights is a multiplicative factor for the inherent heteroscedasticity where variance is a function of the mean.