Search code examples

Different standard error estimations from linearmodels and statsmodels package

Here are reproducible codes:

import pandas as pd
# Outer is entity, inner is time
entity = list(map(chr,range(65,91)))
time = list(pd.date_range('1-1-2014',freq='A', periods=4))
index = pd.MultiIndex.from_product([entity, time])
df = pd.DataFrame(np.random.randn(26*4, 2),index=index, columns=['y','x'])

from linearmodels.panel import PanelOLS
mod = PanelOLS(df.y, df.x, entity_effects=True)
res ='clustered', cluster_entity=True)

This yields result of -0.1425 and 0.1396 for parameter estimation and SE estimation.

df = df.reset_index()
lm = smf.ols('y ~ x - 1 + C(level_0)', df).fit(cov_type='cluster', cov_kwds={'groups': df['level_0']})
print(lm.params['x'], lm.bse['x'])

This yields results of -0.14249279008084645 and 0.16390753835717325, which are not even close for the SE estimated values.


  • partial answer

    statsmodels cluster robust standard errors have an "use_correction" option which makes the standard errors very close but still different.

    I am using a random seed for reproducibility


        lm = smf.ols('y ~ x - 1 + C(level_0)', df).fit(
            cov_kwds={'groups': df['level_0'], 'use_correction': False})
        print(lm.params['x'], lm.bse['x'])  # statsmodels
        -0.011615385632341074 0.11481503664560508
        res.params['x'], res.std_errors['x']  # linearmodels
        (-0.011615385632341178, 0.11537104491755208)

    And linearmodels has a auto_df=False fit option that brings it's standard errors close to those of statsmodels default at 2 decimals.