Search code examples
pythonpandasdataframedata-cleaningfillna

How to fill one column's missing values conditioning on another column's value in Pandas?


I have a dataframe looks like below:

import numpy as np
import pandas as pd
d = {'col1': [np.nan, 19, 32, np.nan, 54, 67], 'col2': [0, 1, 0, 1, 1, 1]}
df = pd.DataFrame(d)

I want to fill the missing values in "col1" based on the values of "col2". To be specific: I want to fill the missing values in "col1" with 0 if "col2" is 0, else leave the "col1" as it is. In this case, my output should look like:

d_updated = {'col1': [0, 19, 32, np.nan, 54, 67], 'col2': [0, 1, 0, 1, 1, 1]}
df_updated = pd.DataFrame(d_updated)

To have the above output, I try to get the index which "col2" have values equal to 0 and use fillna():

ix = list(df[df["col2"] == 0].index)
df["col2"].loc[ix].fillna(0, inplace = True)

However, my approach doesn't work and I don't know why. Thanks ahead.


Solution

  • Try, using loc with boolean indexing:

    df.loc[(df['col1'].isna()) & (df['col2'] == 0), 'col1'] = df['col2']
    

    Output:

       col1  col2
    0   0.0     0
    1  19.0     1
    2  32.0     0
    3   NaN     1
    4  54.0     1
    5  67.0     1