Search code examples
pythonpandasfeature-engineeringdata-preprocessing

Averaging five rows above the value in the target column


The challenge that I have, and don't know how to approach is to have averaged five, ten, or whatever amount of rows above the target value plus the target row.

Dataset

target |  A  |  B  |
----------------------
  nan  |  6  |  4  |
  nan  |  2  |  7  |  
  nan  |  4  |  9  |
  nan  |  7  |  3  |
  nan  |  3  |  7  |
  nan  |  6  |  8  |
  nan  |  7  |  6  |
   53  |  4  |  5  |
  nan  |  6  |  4  |
  nan  |  2  |  7  |
  nan  |  3  |  3  |
  nan  |  4  |  9  |
  nan  |  7  |  3  |
  nan  |  3  |  7  |
   51  |  1  |  3  |

Desired format:

target |  A  |  B  |
----------------------
   53  | 5.16|6.33 |
   51  |3.33 |5.33 |

Solution

  • Try this, [::-1] reversing element to order the dataframe bottom to top, so we can group the values "above" valid targets:

    df.groupby(df['target'].notna()[::-1].cumsum()[::-1]).apply(lambda x: x.tail(6).mean())
    

    Output:

            target         A         B
    target                            
    1         51.0  3.333333  5.333333
    2         53.0  5.166667  6.333333