Search code examples
pythonpandascsvdataframekeyerror

Unexpected Python KeyError


I have loaded a CSV file into a Pandas dataframe:

import pandas as pd

Name     ID    Sex     M_Status    DaysOff
Joe      3      M         S           1
NaN     NaN    NaN       NaN          2
NaN     NaN    NaN       NaN          3

df = pd.read_csv('People.csv')

This data will then be loaded into an HTML file.

test = """

      HTML code

 """

Now for preparing data for the HTML file:

df1 = df.filter(['Name','ID','Sex','M_Status','DaysOff'])

file = ""

for i, rows in df1.iterrows():

   name = (df1['Name'][i])
   id = (df1['ID'][i])
   sex = (df1['Sex'][i])
   m_status = (df1['M_Status'][i])
   days_off = (df1['DaysOff'][i])

   with open(f"personInfo{i}.html", "w") as file:
      file.write(test.format(name,id,sex,m_status,days_off))
      file.close()

And the error:

KeyError: 'days_off'

Note: This error is occurs within the for loop.

Can anyone see where I'm going wrong? This error is generated when you try to grab data from a column which doesn't match the name, or if the column doesn't have that header namne. However, it does.

Error Information:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in 
get_loc(self, key, method, tolerance)
2656             try:
-> 2657                 return self._engine.get_loc(key)
2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'days_off'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-16-35e6b916521b> in <module>
      1     name = (df1['Name'][i])
      2     id =  (df1['ID'][i])
      3     sex = (df1['Sex'][i])
      4     m_status = (df1['M_Status'][i])
----> 5     days_off = (df1['DaysOff'][i])    

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in 
__getitem__(self, key)
   2925             if self.columns.nlevels > 1:
   2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
   2928             if is_integer(indexer):
   2929                 indexer = [indexer]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in 
get_loc(self, key, method, tolerance)
   2657                 return self._engine.get_loc(key)
   2658             except KeyError:
-> 2659                 return 
self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, 
tolerance=tolerance)
   2661         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in 
pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'days_off'

Solution

  • I've resolved this and what a stupid error it was!

    Basically there was a space at the end of the header name.

    What Python wanted/was expecting:

    days_off = (df1['DaysOff '][i])
    

    whereas I was giving it:

    days_off = (df1['DaysOff'][i])
    

    Very stupid human error. Thanks to all that looked into it though