Search code examples
pythonregressionstatsmodelskeyerror

Statsmodel Regression using Pandas Dataframe Column Names: Key Error


I'm trying to run an OLS regression on data that is in a Pandas dataframe format. My datatypes are as follows in case that matters.

Date datetime64[ns] Description object Original Description object Amount float64 Transaction Type object Category object Account Name object Labels object Notes float64

I'm trying to get the Statsmodel package to run an OLS and show me the summary using this code:

import statsmodels.api as sm

endog = df_dumm['Amount']
exog = sm.add_constant(df_dumm["Date","Transaction Type_credit", "Transaction Type_debit"])

mod = sm.OLS(df_dumm['Amount'], exog)
reg=mod.fit()
print(reg.summary())

` I keep getting this error, but am uncertain as to what it means. Could someone help me understand what I'm doing wrong and how I can fix it?


KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3628             try:
-> 3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:

~\anaconda3\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

~\anaconda3\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('Date', 'Transaction Type_credit', 'Transaction Type_debit')

The above exception was the direct cause of the following exception:

`KeyError                                  Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_2364\1982670734.py in <module>
      2 
      3 endog = df_dumm['Amount']
----> 4 exog = sm.add_constant(df_dumm["Date","Transaction Type_credit", "Transaction Type_debit"])
      5 
      6 df_dumm.exog_names[:] = df_dumm["Date","Transaction Type_credit", "Transaction Type_debit"]

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3503             if self.columns.nlevels > 1:
   3504                 return self._getitem_multilevel(key)
-> 3505             indexer = self.columns.get_loc(key)
   3506             if is_integer(indexer):
   3507                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:
-> 3631                 raise KeyError(key) from err
   3632             except TypeError:
   3633                 # If we have a listlike key, _check_indexing_error will raise

KeyError: ('Date', 'Transaction Type_credit', 'Transaction Type_debit')`

Solution

  • You're not selecting multiple columns properly here:

    exog = sm.add_constant(df_dumm["Date","Transaction Type_credit", "Transaction Type_debit"])
    

    You have to do the following:

    exog = sm.add_constant(df_dumm[["Date","Transaction Type_credit", "Transaction Type_debit"]])