Search code examples
pythonregressionpanelstatsmodelslinearmodels

Difference between linearmodels PanelOLS and statsmodels OLS


I am running two regressions that I thought would yield identical results and I'm wondering whether anyone can explain why they are different. One is with statsmodels OLS and the other is with linearmodels PanelOLS.

A minimum working example is shown below. The coefficients are similar, but definitely not the same (0.1167 and 0.3514 from statsmodels, 0.1101 and 0.3100 from linearmodels). And the R-squared is quite different too (0.953 vs 0.767).


import statsmodels.formula.api as smf
from linearmodels import PanelOLS
from statsmodels.datasets import grunfeld

data = grunfeld.load_pandas().data

#   Define formula and run statsmodels OLS regression
ols_formula = 'invest ~ value + capital + C(firm) + C(year) -1'
ols_fit   = smf.ols(ols_formula,data).fit()

#   Set multiindex and run PanelOLS regression
data = data.set_index(['firm','year'])
panel_fit = PanelOLS(data.invest,data[['value','capital']],entity_effects=True).fit()

#   Look at results
ols_fit.summary()
panel_fit

Any insight appreciated!


Solution

  • To replicate the same Betas you should use both entity_effect and time_effect to the panel ols, as follows:

    import statsmodels.formula.api as smf
    from linearmodels import PanelOLS
    from statsmodels.datasets import grunfeld
    
    data = grunfeld.load_pandas().data
    
    #   Define formula and run statsmodels OLS regression
    ols_formula = 'invest ~ value + capital + C(firm) + C(year) -1'
    ols_fit = smf.ols(ols_formula,data).fit()
    
    #   Set multiindex and run PanelOLS regression
    data = data.set_index(['firm','year'])
    panel_fit = PanelOLS(
        data.invest,
        data[['value','capital']],
        entity_effects=True,
        time_effects=True
    ).fit()
    
    #   Look at results
    print(ols_fit.summary())
    print(panel_fit)
    

    Which leads to:

    OLS

    value           0.1167
    capital         0.3514
    R-squared:      0.953
    

    PANEL OLS

    value          0.1167
    capital        0.3514
    R-squared:     0.7253
    

    However, R-squared will remain different due to the different nature of the 2 regressions. In the Panel you have just 2 regressors (value, capital) with firms and year set as fixed effects. While in the OLS regression you have many regressors as the number of dummies created (firms and year) + the value and capital variables. So this naturally leads to a higher R^2