I have a data frame in which there is a column containing several NaN values. The dataframe looks like this:
col_1 col_2
2022-10-31 99.094 102.498
2022-11-30 99.001 101.880
2022-12-31 NaN 108.498
2023-01-31 NaN 100.500
I want to fill those NaN based on the simple calculation below: desired_val = (previous value in col_1 * current value in col_2) / previous value in col_2
which means,
df.loc['2022-12-31', 'col_1'] should be = (99.001 * 108.498) / 101.880 = 105.432
and df.loc['2023-01-31', 'col_1'] should be = (105.432 * 100.500) / 108.498 = 97.660
I found solution by using row by row operation but it is slow when the dataset is big. I tried column wise operation by using this:
df['col_1'] = df['col_1'].fillna(
(df[col_1].shift(1) * df[col_2])
/ df[col_2].shift(1)
)
But it does work only for one row and then it does not go further. Is there any column wise pandas solution for that?
You can think of your operation and see that you multiply by x
in one row and divide by x
in the next row.
Thus you can simplify the result to:
col1_value = (last_valid_col1_value * current_col2_value)
/ col2_value_at_last_valid_col1_position
Which can be translated as:
# is the row a NA?
m1 = df['col_1'].isna()
# is the next row a NA?
m2 = df['col_1'].shift(-1).isna()
df.loc[m1, 'col_1'] = (df['col_1'].div(df['col_2'])
.where(m2 & ~m1).ffill()
.mul(df['col_2'])[m1]
)
Output:
col_1 col_2
2022-10-31 99.094000 102.498
2022-11-30 99.001000 101.880
2022-12-31 105.431984 108.498
2023-01-31 97.659997 100.500
Intermediates:
col_1 col_2 m1 m2 m2&~m1 ffilled(col1/col2) result
2022-10-31 99.094 102.498 False False False NaN NaN
2022-11-30 99.001 101.880 False True True 0.971741 NaN
2022-12-31 NaN 108.498 True True False 0.971741 105.431984
2023-01-31 NaN 100.500 True True False 0.971741 97.659997