time1 x y z GMT- 5 key time2 a b c GMT cut_off time_diff new_column
1 1.674841e+09 -1.10 64.11 -1.33 2023-01-27 12:43:22 PM 0 1.674841e+09 2.96 606.270614 2.80 2023-01-27 12:43:24 PM 1.674841e+09 2.308100 NaN
2 1.674841e+09 -1.10 64.11 -1.33 2023-01-27 12:43:22 PM 0 1.674841e+09 2.96 584.696883 2.80 2023-01-27 12:43:26 PM 1.674841e+09 4.303636 NaN
3 1.674841e+09 -1.10 64.11 -1.33 2023-01-27 12:43:22 PM 0 1.674841e+09 2.96 615.295633 2.80 2023-01-27 12:43:28 PM 1.674841e+09 6.298568 NaN
4 1.674841e+09 -1.10 64.11 -1.33 2023-01-27 12:43:22 PM 0 1.674841e+09 2.96 587.050575 2.80 2023-01-27 12:43:30 PM 1.674841e+09 8.293623 NaN
5 1.674841e+09 -2.24 93.51 -2.36 2023-01-27 12:43:46 PM 0 1.674841e+09 2.96 584.700016 2.80 2023-01-27 12:43:46 PM 1.674841e+09 0.007554 0.007554
100 1.674842e+09 -1.24 84.73 -2.44 2023-01-27 12:49:07 PM 0 1.674843e+09 2.30 1024.363758 2.64 2023-01-27 01:13:11 PM 1.674843e+09 1444.068500 NaN
101 1.674842e+09 -1.24 84.73 -2.44 2023-01-27 12:49:07 PM 0 1.674843e+09 2.31 1011.438119 2.64 2023-01-27 01:13:13 PM 1.674843e+09 1446.063470 NaN
102 1.674842e+09 -1.24 84.73 -2.44 2023-01-27 12:49:07 PM 0 1.674843e+09 2.32 1005.181835 2.64 2023-01-27 01:13:15 PM 1.674843e+09 1448.058710 NaN
103 1.674842e+09 -1.24 84.73 -2.44 2023-01-27 12:49:07 PM 0 1.674843e+09 2.34 989.515657 2.64 2023-01-27 01:13:17 PM 1.674843e+09 1450.053643 NaN
104 1.674842e+09 -1.24 84.73 -2.44 2023-01-27 12:49:07 PM 0 1.674843e+09 2.34 1016.183097 2.64 2023-01-27 01:13:19 PM 1.674843e+09 1452.048679 NaN
105 1.674842e+09 -1.57 80.04 -1.96 2023-01-27 12:49:06 PM 0 1.674842e+09 2.02 1652.185708 2.88 2023-01-27 12:49:06 PM 1.674842e+09 0.001867 0.001867
We actually need the row without nan value in column: 'new_column". here it is the rows: 5 and 105 Bu we need the average of 'x', 'y', 'z' of the (rows 1 to 5) and (rows 100 to 105) in the 5th row and 105th row
The desired output:
time1 x y z GMT- 5 key time2 a b c GMT cut_off time_diff new_column
5 1.674841e+09 -1.328 69.99 -1.536 2023-01-27 12:43:46 PM 0 1.674841e+09 2.96 584.700016 2.80 2023-01-27 12:43:46 PM 1.674841e+09 0.007554 0.007554
105 1.674842e+09 -1.295 69.82 -2.36 2023-01-27 12:49:06 PM 0 1.674842e+09 2.02 1652.185708 2.88 2023-01-27 12:49:06 PM 1.674842e+09 0.001867 0.001867
First lets try and create a group. We can use cumulative sum to do this on the "new column". Just replace the Nan values with 0 and the others with 1 and shift it down by 1
df["binary"] = df["new_column"].fillna(0)
df.loc[df.binary!=0,"binary"] = 1
df["binary"] = df["binary"].shift(1,fill_value=0)
df["cumsum"] = df["binary"].cumsum()
time1 x y z ... time_diff new_column binary cumsum
0 1.670000e+09 -1.10 64.11 -1.33 ... 2.308100 NaN 0.0 0.0
1 1.670000e+09 -1.10 64.11 -1.33 ... 4.303636 NaN 0.0 0.0
2 1.670000e+09 -1.10 64.11 -1.33 ... 6.298568 NaN 0.0 0.0
3 1.670000e+09 -1.10 64.11 -1.33 ... 8.293623 NaN 0.0 0.0
4 1.670000e+09 -2.24 93.51 -2.36 ... 0.007554 0.007554 0.0 0.0
5 1.670000e+09 -1.24 84.73 -2.44 ... 1444.068500 NaN 1.0 1.0
6 1.670000e+09 -1.24 84.73 -2.44 ... 1446.063470 NaN 0.0 1.0
7 1.670000e+09 -1.24 84.73 -2.44 ... 1448.058710 NaN 0.0 1.0
8 1.670000e+09 -1.24 84.73 -2.44 ... 1450.053643 NaN 0.0 1.0
9 1.670000e+09 -1.24 84.73 -2.44 ... 1452.048679 NaN 0.0 1.0
10 1.670000e+09 -1.57 80.04 -1.96 ... 0.001867 0.001867 0.0 1.0
After this it is a simple groupby on the cumulative sum
G = df.groupby("cumsum")
df["x_avg"] = G['x'].transform('mean')
df["y_avg"] = G['y'].transform('mean')
df["z_avg"] = G['z'].transform('mean')
filtered_df = df[~pd.isna(df["new_column"])]
time1 x y z ... cumsum x_avg y_avg z_avg
4 1.670000e+09 -2.24 93.51 -2.36 ... 0.0 -1.328 69.990000 -1.536
10 1.670000e+09 -1.57 80.04 -1.96 ... 1.0 -1.295 83.948333 -2.360