I'm trying to run an OLS regression on data that is in a Pandas dataframe format. My datatypes are as follows in case that matters.
Date datetime64[ns] Description object Original Description object Amount float64 Transaction Type object Category object Account Name object Labels object Notes float64
I'm trying to get the Statsmodel package to run an OLS and show me the summary using this code:
import statsmodels.api as sm
endog = df_dumm['Amount']
exog = sm.add_constant(df_dumm["Date","Transaction Type_credit", "Transaction Type_debit"])
mod = sm.OLS(df_dumm['Amount'], exog)
reg=mod.fit()
print(reg.summary())
` I keep getting this error, but am uncertain as to what it means. Could someone help me understand what I'm doing wrong and how I can fix it?
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3628 try:
-> 3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
~\anaconda3\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
~\anaconda3\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('Date', 'Transaction Type_credit', 'Transaction Type_debit')
The above exception was the direct cause of the following exception:
`KeyError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_2364\1982670734.py in <module>
2
3 endog = df_dumm['Amount']
----> 4 exog = sm.add_constant(df_dumm["Date","Transaction Type_credit", "Transaction Type_debit"])
5
6 df_dumm.exog_names[:] = df_dumm["Date","Transaction Type_credit", "Transaction Type_debit"]
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
-> 3631 raise KeyError(key) from err
3632 except TypeError:
3633 # If we have a listlike key, _check_indexing_error will raise
KeyError: ('Date', 'Transaction Type_credit', 'Transaction Type_debit')`
You're not selecting multiple columns properly here:
exog = sm.add_constant(df_dumm["Date","Transaction Type_credit", "Transaction Type_debit"])
You have to do the following:
exog = sm.add_constant(df_dumm[["Date","Transaction Type_credit", "Transaction Type_debit"]])