Search code examples
pythonpandasnumpyfor-loopkeyerror

For-Loop Index KeyError when trying to get sums


I want to find out the SUM of a variable for a specific classification. My variable a is a (CM) classification, the other one is a float (zahlgesnet). I'm getting an Error I cannot really recognize myself whats it about.

This is actually to calculate the sums of a confusion Matrix Output. I tried so far to use different methods but I want to know why this Method is not working. Since I'm python beginner, I'm not sure if this Method is overall suitable.

This is how the Data Looks like:

  ID zahlgesnet CM 

1 1  2.234,42   0
3 2  0          3
4 3  234        0
6 4  8.234      2
7 5  653,23     1  
9 6  134        2 

And this is my Code:

SummeFF = 0
SummeFT = 0
SummeTF = 0
SummeTT = 0
result = 0

def getsums(x,y,z,v):
    for i in range(len(X_valid)):
        if X_valid.CM[i] == 0:
            x += (X_valid.zahlgesnet[i])
        elif X_valid.CM[i] == 1:
            y += (X_valid.zahlgesnet[i])
        elif X_valid.CM[i] == 2:
            z += (X_valid.zahlgesnet[i])
        elif X_valid.CM[i] == 3:
            v += (X_valid.zahlgesnet[i])

getsums(SummeFF,SummeFT,SummeTF,SummeTT)            
print(SummeFF,SummeFT,SummeTF,SummeTT)

I expected that the function itinerates trough all the values and gives four different sums back, based on the classificator CM.

The Error I get is:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-40-788203eee12b> in <module>
     20 
     21 
---> 22 getsums(SummeFF,SummeFT,SummeTF,SummeTT)
     23 print(SummeFF,SummeFT,SummeTF,SummeTT)
     24 

<ipython-input-40-788203eee12b> in getsums(x, y, z, v)
      7 def getsums(x,y,z,v):
      8     for i in range(len(X_valid)):
----> 9         if X_valid.CM[i] == 0:
     10             x += (X_valid.zahlgesnet[i])
     11         elif X_valid.CM[i] == 1:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
   1066         key = com.apply_if_callable(key, self)
   1067         try:
-> 1068             result = self.index.get_value(self, key)
   1069 
   1070             if not is_scalar(result):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   4728         k = self._convert_scalar_indexer(k, kind="getitem")
   4729         try:
-> 4730             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4731         except KeyError as e1:
   4732             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 2

Does it have something to do with the Index of my Dataframe is not following 1,2,3,4,5,6,7,8,9,10? Is there a way to fix it? It resulted out of a lot of slicing of the dataframe before. Thanks a lot already!


Solution

  • Is one of these is what you need?

    df.groupby('CM')['zahlgesnet'].transform('sum')
    

    Output

    1    2.234,42234
    3              0
    4    2.234,42234
    6       8.234134
    7         653,23
    9       8.234134
    
    df.groupby('CM')['zahlgesnet'].sum()
    

    output

    CM
    0    2.234,42234
    1         653,23
    2       8.234134
    3              0