Search code examples
pythonpandaskeyerror

KeyError while tring to write an additional field in Python Pandas dataframe


I want to add a calculated field 'Score' in dataframe positions_deposits.

When I run the following operation on pandas dataframe positions_deposits,

for i in range(len(positions_deposits)):
    <Read some values from the dataframe which would be passed to a function in the next line>
    Score = RAG_function (Amber_threshold, Red_threshold, Type_threshold, Values)
    positions_deposits['Score'].loc[i] = Score

I get the following error. Can you please guide me through what error I am making and how to resolve it?

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Score'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-201-7d0481b84aa4> in <module>
      6     Values = positions_deposits['Values'].loc[i]
      7 #     Score = RAG_function (Amber_threshold, Red_threshold, Type_threshold, Values)
----> 8     positions_deposits["Score"].loc[i] = RAG_function (Amber_threshold, Red_threshold, Type_threshold, Values)
      9 
     10 #     print("Score is %i.00" %Score)

~/.local/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2904             if self.columns.nlevels > 1:
   2905                 return self._getitem_multilevel(key)
-> 2906             indexer = self.columns.get_loc(key)
   2907             if is_integer(indexer):
   2908                 indexer = [indexer]

~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'Score'

Please note: if I print(Score), there is no error. It means the function, RAG_function is getting executed but the dataframe is failing.

Thanks!


Solution

  • You'll probably want to read up on how .loc and .iloc work. But having said that, there is another way which is better:

    import pandas
    import random
    
    df = pandas.DataFrame([{"A": random.randint(0,100), "B": random.randint(0,100)} for _ in range(100)])
    
    def rag_function(row):
        A = row["A"]
        B = row["B"]
        return A * B
    
    df["Score"] = df.apply(rag_function, axis=1)
    

    NOTE: I don't have your RAG_function so I've created some random function. The idea is that you apply this function to every row in the dataframe.