Search code examples
pythonpandasgroup-byrunning-count

Setting the last n non NaN vale per group with nan


I have a DataFrame with (several) grouping variables and (several) value variables. My goal is to set the last n non nan values to nan. So let's take a simple example:

df = pd.DataFrame({'id':[1,1,1,2,2,],
                  'value':[1,2,np.nan, 9,8]})
df
Out[1]:
id  value
0   1   1.0
1   1   2.0
2   1   NaN
3   2   9.0
4   2   8.0

The desired result for n=1 would look like the following:

Out[53]:
id  value
0   1   1.0
1   1   NaN
2   1   NaN
3   2   9.0
4   2   NaN

Solution

  • Use with groupby().cumcount():

    N=1
    groups = df.loc[df['value'].notna()].groupby('id')
    enum = groups.cumcount()
    sizes = groups['value'].transform('size')
    
    df['value'] = df['value'].where(enum < sizes - N)
    

    Output:

       id  value
    0   1    1.0
    1   1    NaN
    2   1    NaN
    3   2    9.0
    4   2    NaN