Search code examples
pythonpandasdataframelambda

Calculating column value based on previous row and column using lambda function


I have this pandas dataframe that looks like this:

index up_walk    down_walk   up_avg  down_avg
  0   0.000000   17.827148  0.36642   9.06815
  1   1.550781    0.000000      NaN       NaN
  2   0.957031    0.000000      NaN       NaN
  3   0.000000    2.878906      NaN       NaN

I wanted to calculate the missing values that currently are NAN by this formula:

df['up_avg'][i] = df['up_avg'][i-1] * 12 + df['up_walk'][i]

explanation: I want to calculate for every row the value based on the previous row in the same column, plus the value in the current row from a different column. And that for every row with missing values. and Continue this calculation to the end of the dataframe. In this case, I have a dependency in every new row calculation that is based on the previous up_avg value calculation.

The problem is that using a loop is very slow because of the large dataframe(10K)

Can anyone please help implement a lambda function for this?

if this is not possible, can anyone share a script for an efficient loop?

I tried a lot of things with no success like this:

df['up_avg'] = df.apply(lambda x: pd.Series(np.where((x.up_avg != None), x.up_avg.shift() * 12 + x.up_walk, x.up_avg)))

got an error -  "AttributeError: 'Series' object has no attribute 'up_avg'"

and also using shift to create new columns and then using a lambda function with no success

I expect that my dataframe will look like this at the end:

index up_walk    down_walk   up_avg  down_avg
  0   0.000000   17.827148  0.36642   9.06815
  1   1.550781    0.000000  5.947821  108.8178
  2   0.957031    0.000000  72.330883 1305.8136
  3   0.000000    2.878906  867.970596  15672.642106

Thanks a lot!


Solution

  • you can use np.roll instead of shift. Also, if you are using apply, you must specify an axis:

    #keep going until there is no nan value left
    status=True
    while status:
        df['up_avg'] = np.where((np.isnan(df.up_avg)==True), np.roll(df.up_avg,1) * 12 +df.up_walk ,df.up_avg)
        if df['up_avg'].isnull().sum() == 0:
            status=False
            
    status=True
    while status:
        df['down_avg'] = np.where((np.isnan(df.down_avg)==True), np.roll(df.down_avg,1) * 12 +df.down_walk ,df.down_avg)
        if df['down_avg'].isnull().sum() == 0:
            status=False
    
    print(df)
        up_walk   down_walk     up_avg      down_avg
    0   0.0       17.827148     0.36642     9.06815
    1   1.550781  0.0           5.947821    108.81779999999999
    2   0.957031  0.0           72.330883   1305.8136
    3   0.0       2.878906      867.970596  15672.642106
    
    ```