I have found a solution that addresses this by rows, but is there a quick way to do it by column?
Here is a quick sample of the dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame([['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['SF',15.0],
['GB',5.7],
[1.0,np.nan],
['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['BUF',7.0],
['GB',5.7],
[2.0,np.nan]], columns = ['team','points'])
I've been trying to manipulate df['sum'] = df['points'].cumsum()
. Obviously it does the cumlutive sum, but what I need it to do is to restart when/if gets to a nan
instead of just skipping it over.
Use GroupBy.cumsum
with helper Series created by check missing value by another cumsum
:
df['sum'] = df.groupby(df['points'].isna().cumsum())['points'].cumsum()
print (df)
team points sum
0 GB 43.76 43.76
1 TEN 17.30 61.06
2 ARI 0.20 61.26
3 ATL 12.30 73.56
4 HOU 21.10 94.66
5 ARI 1.70 96.36
6 ATL 12.60 108.96
7 SF 15.00 123.96
8 GB 5.70 129.66
9 1 NaN NaN
10 GB 43.76 43.76
11 TEN 17.30 61.06
12 ARI 0.20 61.26
13 ATL 12.30 73.56
14 HOU 21.10 94.66
15 ARI 1.70 96.36
16 ATL 12.60 108.96
17 BUF 7.00 115.96
18 GB 5.70 121.66
19 2 NaN NaN