Search code examples
pythonpandasdataframemissing-datafillna

How to use each vector entry to fill NAN's of a separate groups in a dataframe


Say I have a vector ValsHR which looks like this:

valsHR=[78.8, 82.3, 91.0]

And I have a dataframe MainData

Age  Patient  HR             
21   1        NaN
21   1        NaN
21   1        NaN
30   2        NaN
30   2        NaN
24   3        NaN
24   3        NaN
24   3        NaN 

I want to fill the NaNs so that the first value in valsHR will only fill in the NaNs for patient 1, the second will fill the NaNs for patient 2 and the third will fill in for patient 3.

So far I've tried using this: mainData['HR'] = mainData['HR'].fillna(ValsHR) but it fills all the NaNs with the first value in the vector.

I've also tried to use this: mainData['HR'] = mainData.groupby('Patient').fillna(ValsHR) fills the NaNs with values that aren't in the valsHR vector at all.

I was wondering if anyone knew a way to do this?


Solution

  • Create dictionary by Patient values with missing values, map to original column and replace missing values only:

    print (df)
       Age  Patient     HR
    0   21        1    NaN
    1   21        1    NaN
    2   21        1    NaN
    3   30        2  100.0 <- value is not replaced
    4   30        2    NaN
    5   24        3    NaN
    6   24        3    NaN
    7   24        3    NaN
    
    
    p = df.loc[df.HR.isna(), 'Patient'].unique()
    valsHR = [78.8, 82.3, 91.0]
    
    df['HR'] = df['HR'].fillna(df['Patient'].map(dict(zip(p, valsHR))))
    print (df)
       Age  Patient     HR
    0   21        1   78.8
    1   21        1   78.8
    2   21        1   78.8
    3   30        2  100.0
    4   30        2   82.3
    5   24        3   91.0
    6   24        3   91.0
    7   24        3   91.0
    

    If some groups has no NaNs:

    print (df)
       Age  Patient     HR
    0   21        1    NaN
    1   21        1    NaN
    2   21        1    NaN
    3   30        2  100.0 <- group 2 is not replaced
    4   30        2  100.0 <- group 2 is not replaced
    5   24        3    NaN
    6   24        3    NaN
    7   24        3    NaN
    
    
    p = df.loc[df.HR.isna(), 'Patient'].unique()
    valsHR = [78.8, 82.3, 91.0]
    
    df['HR'] = df['HR'].fillna(df['Patient'].map(dict(zip(p, valsHR))))
    print (df)
       Age  Patient     HR
    0   21        1   78.8
    1   21        1   78.8
    2   21        1   78.8
    3   30        2  100.0
    4   30        2  100.0
    5   24        3   82.3
    6   24        3   82.3
    7   24        3   82.3