Here are reproducible codes:
import pandas as pd
# Outer is entity, inner is time
entity = list(map(chr,range(65,91)))
time = list(pd.date_range('1-1-2014',freq='A', periods=4))
index = pd.MultiIndex.from_product([entity, time])
df = pd.DataFrame(np.random.randn(26*4, 2),index=index, columns=['y','x'])
from linearmodels.panel import PanelOLS
mod = PanelOLS(df.y, df.x, entity_effects=True)
res = mod.fit(cov_type='clustered', cluster_entity=True)
print(res)
This yields result of
-0.1425
and
0.1396
for parameter estimation and SE estimation.
df = df.reset_index()
lm = smf.ols('y ~ x - 1 + C(level_0)', df).fit(cov_type='cluster', cov_kwds={'groups': df['level_0']})
print(lm.params['x'], lm.bse['x'])
This yields results of -0.14249279008084645
and 0.16390753835717325
, which are not even close for the SE estimated values.
partial answer
statsmodels cluster robust standard errors have an "use_correction" option which makes the standard errors very close but still different.
I am using a random seed for reproducibility
np.random.seed(9865378)
lm = smf.ols('y ~ x - 1 + C(level_0)', df).fit(
cov_type='cluster',
cov_kwds={'groups': df['level_0'], 'use_correction': False})
print(lm.params['x'], lm.bse['x']) # statsmodels
-0.011615385632341074 0.11481503664560508
res.params['x'], res.std_errors['x'] # linearmodels
(-0.011615385632341178, 0.11537104491755208)
And linearmodels has a auto_df=False
fit option that brings it's standard errors close to those of statsmodels default at 2 decimals.