Search code examples
pandasstatsmodelspanel-data

categorical variable panel ols


For my PanelOLS i like to include categorical variables. This is my model:

import statsmodels.api as sm

exog_vars = ['x1', 'x2', 'x3']
exog = sm.add_constant(df[exog_vars])
mod = PanelOLS(df.y, exog, entity_effects=True, time_effects=True)
result = mod.fit(cov_type='clustered', cluster_entity=True)

The categorial variable is a number for a industry. This nummber is stored in my dataframe(df['x4']). Do you know how to include categorical variables? Or do you need more information to answer the question.

My dataframe: enter image description here

I tried:

df['x4'] = pd.Categorical(gesamt.x4)

mod = PanelOLS(gesamt.CAR, exog, other_effects=df['x4'], entity_effects=True, time_effects=True)

The follwing error occured:

raise ValueError('At most two effects supported.')

ValueError: At most two effects supported.


Solution

  • The simplest way to do this is probably to one-hot-encode your column x4.

    If you have

    df = pd.DataFrame({'x1': [1,2,3], 'x4': ['bob', 'cat' ,'cat']})
    df
    

    which looks like

       x1   x4
    0   1  bob
    1   2  cat
    2   3  cat
    

    then

    pd.get_dummies(df, 'x4')
    

    gives you

       x1  x4_bob  x4_cat
    0   1       1       0
    1   2       0       1
    2   3       0       1
    

    Alternatively,

    df['x4'] = pd.Categorical(df.x4).codes
    df
    

    will give you

       x1  x4
    0   1   0
    1   2   1
    2   3   1