I have this pandas dataframe that looks like this:
index up_walk down_walk up_avg down_avg
0 0.000000 17.827148 0.36642 9.06815
1 1.550781 0.000000 NaN NaN
2 0.957031 0.000000 NaN NaN
3 0.000000 2.878906 NaN NaN
I wanted to calculate the missing values that currently are NAN by this formula:
df['up_avg'][i] = df['up_avg'][i-1] * 12 + df['up_walk'][i]
explanation: I want to calculate for every row the value based on the previous row in the same column, plus the value in the current row from a different column. And that for every row with missing values. and Continue this calculation to the end of the dataframe. In this case, I have a dependency in every new row calculation that is based on the previous up_avg value calculation.
The problem is that using a loop is very slow because of the large dataframe(10K)
Can anyone please help implement a lambda function for this?
if this is not possible, can anyone share a script for an efficient loop?
I tried a lot of things with no success like this:
df['up_avg'] = df.apply(lambda x: pd.Series(np.where((x.up_avg != None), x.up_avg.shift() * 12 + x.up_walk, x.up_avg)))
got an error - "AttributeError: 'Series' object has no attribute 'up_avg'"
and also using shift to create new columns and then using a lambda function with no success
I expect that my dataframe will look like this at the end:
index up_walk down_walk up_avg down_avg
0 0.000000 17.827148 0.36642 9.06815
1 1.550781 0.000000 5.947821 108.8178
2 0.957031 0.000000 72.330883 1305.8136
3 0.000000 2.878906 867.970596 15672.642106
Thanks a lot!
you can use np.roll instead of shift. Also, if you are using apply, you must specify an axis:
#keep going until there is no nan value left
status=True
while status:
df['up_avg'] = np.where((np.isnan(df.up_avg)==True), np.roll(df.up_avg,1) * 12 +df.up_walk ,df.up_avg)
if df['up_avg'].isnull().sum() == 0:
status=False
status=True
while status:
df['down_avg'] = np.where((np.isnan(df.down_avg)==True), np.roll(df.down_avg,1) * 12 +df.down_walk ,df.down_avg)
if df['down_avg'].isnull().sum() == 0:
status=False
print(df)
up_walk down_walk up_avg down_avg
0 0.0 17.827148 0.36642 9.06815
1 1.550781 0.0 5.947821 108.81779999999999
2 0.957031 0.0 72.330883 1305.8136
3 0.0 2.878906 867.970596 15672.642106
```