Search code examples
pythonpandascumsumfillna

pandas fillna in column with cumsum of previous rows (reset after every nan)


I have found a solution that addresses this by rows, but is there a quick way to do it by column?

Here is a quick sample of the dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame([['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['SF',15.0],
['GB',5.7],
[1.0,np.nan],
['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['BUF',7.0],
['GB',5.7],
[2.0,np.nan]], columns = ['team','points'])

I've been trying to manipulate df['sum'] = df['points'].cumsum(). Obviously it does the cumlutive sum, but what I need it to do is to restart when/if gets to a nan instead of just skipping it over.


Solution

  • Use GroupBy.cumsum with helper Series created by check missing value by another cumsum:

    df['sum'] = df.groupby(df['points'].isna().cumsum())['points'].cumsum()
    print (df)
       team  points     sum
    0    GB   43.76   43.76
    1   TEN   17.30   61.06
    2   ARI    0.20   61.26
    3   ATL   12.30   73.56
    4   HOU   21.10   94.66
    5   ARI    1.70   96.36
    6   ATL   12.60  108.96
    7    SF   15.00  123.96
    8    GB    5.70  129.66
    9     1     NaN     NaN
    10   GB   43.76   43.76
    11  TEN   17.30   61.06
    12  ARI    0.20   61.26
    13  ATL   12.30   73.56
    14  HOU   21.10   94.66
    15  ARI    1.70   96.36
    16  ATL   12.60  108.96
    17  BUF    7.00  115.96
    18   GB    5.70  121.66
    19    2     NaN     NaN