Search code examples
pythonregressionstatsmodelspanel-datalinearmodels

Is there a way to derive the intercept of the firm fixed effect from the Python PanelOLS model?


I am in the process of estimating the fixed effect of panel data using the Python statsmodel package.

First, the data used in the analysis include X and Y observed over time with several companies. Below are some examples from the actual data, but originally, there is a Balanced Panel of about 5,000 companies' one-year data.

| date       | firm | X1 | X2 | X3 | Y |
|:---------- |:----:|:--:|:--:|:--:|--:|
| 2021-01-01 | A    | 1  | 4  | 1  | 10|
| 2021-01-02 | A    | 2  | 7  | 0  | 21|
| 2021-01-03 | A    | 4  | 3  | 1  | 12|
| 2021-01-01 | B    | 2  | 1  | 0  | 4 |
| 2021-01-02 | B    | 3  | 7  | 1  | 9 |
| 2021-01-03 | B    | 7  | 1  | 1  | 4 |

When analyzing the fixed effect model that controlled the effect of the company with the code below, the results were well derived without any problems.

mod = PanelOLS.from_formula('Y ~ X1 + X2 + X3 + EntityEffects',
                            data=df.set_index(['firm', 'date']))
result = mod.fit(cov_type='clustered', cluster_entity=True)
result.summary

[out put]

this is PanelOLS outputs

However, the problem is that the effect of the intercept term is not printed on the result value, so I want to find a way to solve this problem.

Is there an option to force the intercept term to be output?


Solution

  • It is not very clear from the git but it looks like it is stored under result.estimated_effects. You should also mention it is from linearmodels, not statsmodels .

    from linearmodels import PanelOLS
    import pandas as pd
    
    df = pd.DataFrame({'date':['2021-01-01','2021-01-02','2021-01-03',
    '2021-01-01','2021-01-02','2021-01-03'],
    'firm':['A','A','A','B','B','B'],
    'X1':[1,2,4,2,3,7],'X2':[4,7,3,1,7,1],
    'X3':[1,0,1,0,1,1],'Y':[10,21,12,4,9,4]})
    
    df['date'] = pd.to_datetime(df['date'])
    
    mod = PanelOLS.from_formula('Y ~ X1 + X2 + X3 + EntityEffects',
                                data=df.set_index(['firm', 'date']))
    
    result = mod.fit(cov_type='clustered', cluster_entity=True)
    result.estimated_effects
    
    
    
                     estimated_effects
    firm date                         
    A    2021-01-01           8.179545
         2021-01-02           8.179545
         2021-01-03           8.179545
    B    2021-01-01           0.258438
         2021-01-02           0.258438
         2021-01-03           0.258438