Search code examples
pythonpandasfillna

How to populate NaN values based on conditions from two other columns using Pandas?


I have a dataframe that looks something like this:

ID hiqual Wave
1 1.0 g
1 NaN i
1 NaN k
2 1.0 g
2 NaN i
2 NaN k
3 1.0 g
3 NaN i
4 5.0 g
4 NaN i

This is a long format dataframe and I have my hiqual variable for my first measurement wave (g). I would like to populate the NaN values for the subsequent measurement waves (i and k) as the same value give in wave g for each ID.

I tried using fillna() but I am not sure how to provide the two conditions of ID and Wave and how to populate based on that. I would be grateful for any help/suggestions on this?


Solution

  • If you dataframe is already ordered by ID and wave columns, you can simply fill forward values:

    >>> df.sort_values(['ID', 'Wave']).ffill()
       ID  hiqual Wave
    0   1     1.0    g
    1   1     1.0    i
    2   1     1.0    k
    3   2     1.0    g
    4   2     1.0    i
    5   2     1.0    k
    6   3     1.0    g
    7   3     1.0    i
    8   4     5.0    g
    9   4     5.0    i
    

    You can also use explicitly g values:

    g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
    df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
    print(df)
    print(g_vals)
    
    # Output
       ID  hiqual Wave
    0   1     1.0    g
    1   1     1.0    i
    2   1     1.0    k
    3   2     1.0    g
    4   2     1.0    i
    5   2     1.0    k
    6   3     1.0    g
    7   3     1.0    i
    8   4     5.0    g
    9   4     5.0    i
    
    # g_vals
    ID
    1    1.0
    2    1.0
    3    1.0
    4    5.0
    Name: hiqual, dtype: float64