Search code examples
pythonpandasfillna

fillna by avoiding row wise operation in pandas


I have a data frame in which there is a column containing several NaN values. The dataframe looks like this:

            col_1   col_2
2022-10-31  99.094  102.498
2022-11-30  99.001  101.880
2022-12-31     NaN  108.498
2023-01-31     NaN  100.500

I want to fill those NaN based on the simple calculation below: desired_val = (previous value in col_1 * current value in col_2) / previous value in col_2

which means,

df.loc['2022-12-31', 'col_1'] should be = (99.001 * 108.498) / 101.880 = 105.432

and df.loc['2023-01-31', 'col_1'] should be = (105.432 * 100.500) / 108.498 = 97.660

I found solution by using row by row operation but it is slow when the dataset is big. I tried column wise operation by using this:

df['col_1'] = df['col_1'].fillna(
        (df[col_1].shift(1) * df[col_2])
        / df[col_2].shift(1)
       )

But it does work only for one row and then it does not go further. Is there any column wise pandas solution for that?


Solution

  • You can think of your operation and see that you multiply by x in one row and divide by x in the next row.

    Thus you can simplify the result to:

    col1_value = (last_valid_col1_value * current_col2_value)
                 / col2_value_at_last_valid_col1_position
    

    Which can be translated as:

    # is the row a NA?
    m1 = df['col_1'].isna()
    # is the next row a NA?
    m2 = df['col_1'].shift(-1).isna()
    
    df.loc[m1, 'col_1'] = (df['col_1'].div(df['col_2'])
                           .where(m2 & ~m1).ffill()
                           .mul(df['col_2'])[m1]
                          )
    

    Output:

                     col_1    col_2
    2022-10-31   99.094000  102.498
    2022-11-30   99.001000  101.880
    2022-12-31  105.431984  108.498
    2023-01-31   97.659997  100.500
    

    Intermediates:

                 col_1    col_2     m1     m2  m2&~m1  ffilled(col1/col2)      result
    2022-10-31  99.094  102.498  False  False   False                 NaN         NaN
    2022-11-30  99.001  101.880  False   True    True            0.971741         NaN
    2022-12-31     NaN  108.498   True   True   False            0.971741  105.431984
    2023-01-31     NaN  100.500   True   True   False            0.971741   97.659997