Search code examples
pythonstatisticsstatsmodelspanel-datarandom-effects

MixedLMResults object return NaN BIC. What can be the reason?


Here is my code:

import statsmodels.formula.api as smf
md = smf.mixedlm("dep ~ indep", df, groups=df["groups"], re_formula='~indep')
mdf = md.fit(method=["lbfgs"]) 

mdf.bic returns nan as output. What can be the reason? If it is package related problem. Could anyone provide manual calculation of BIC for this case?


Solution

  • Check the comment below your question from @StupidWolf, I think it's correct that BIC is not provided for LM models. Even when you print mdf.summary(), you will find neither BIC nor AIC in the output.

    But if you want to calculate BIC, the formula is quite simple. You can even refer to the code in statsmodels:

    def bic(self):
        """Bayesian information criterion"""
        if self.reml:
            return np.nan
        if self.freepat is not None:
            df = self.freepat.get_packed(use_sqrt=False, has_fe=True).sum() + 1
        else:
            df = self.params.size + 1
        return -2 * self.llf + np.log(self.nobs) * df
    

    This should answer your question:

    • self.reml (REstricted Maximum Likelihood) is set to true - that's the reason why are you getting nan for BIC.
    • it shows you, how to calculate BIC

    You can see that BIC calculation follow a standard BIC formula. It takes logarithms of number of data points/observations multiplied by number of free parameters/degree of freedom (np.log(self.nobs) * df) and subtract Log-likelihood of model times 2 (-2 * self.llf). So, you could calculate BIC with your code as follows:

    -2 * mdf.llf + np.log(mdf.nobs) * (mdf.df_modelwc)
    

    Note, that here I used mdf.df_modelwc which in fact, in this case, returns mdf.params.size.

    But beware of the things mentioned in the beginning (for completeness, here is the link to SE provided by @StupidWolf - How should mixed effects models be compared and or validated?.