I am trying to fill NaN in multiple columns (too many for the solution to be hardcoded) that share part of their name with values from another column in the same pandas dataframe.
I know that I can fill multiple columns using a constant value and also that I can fill a single column using another from the same dataframe. It's the combination of these two that is not working for me.
For example, consider the data frame:
df = pd.DataFrame({'Val': [1.2,5.4,3.1,4], 'Col - 1': [None,5,1,None], 'Col - 2': [None,None,6,None]})
print(df)
Val Col - 1 Col - 2
0 1.2 NaN NaN
1 5.4 5.0 NaN
2 3.1 1.0 6.0
3 4.0 NaN NaN
Filling multiple columns with a constant value works:
df.loc[:,df.columns.str.contains('Col')] = df.loc[:,df.columns.str.contains('Col')].fillna(value=15)
print(df)
Val Col - 1 Col - 2
0 1.2 15.0 15.0
1 5.4 5.0 15.0
2 3.1 1.0 6.0
3 4.0 15.0 15.0
Filling a single column with values from another column also works:
df['Col - 2'] = df['Col - 2'].fillna(value=df['Val'])
print(df)
Val Col - 1 Col - 2
0 1.2 NaN 1.2
1 5.4 5.0 5.4
2 3.1 1.0 6.0
3 4.0 NaN 4.0
What doesn't work is a combination of the two:
df.loc[:,df.columns.str.contains('Col')] = df.loc[:,df.columns.str.contains('Col')].fillna(value=df['Val'])
The above does nothing and returns the original dataframe. What I am expecting is this:
Val Col - 1 Col - 2
0 1.2 1.2 1.2
1 5.4 5.0 5.4
2 3.1 1.0 6.0
3 4.0 4.0 4.0
You should adding apply
lambda
, since dataframe fillna
will also check the columns
name , you fill it by the pd.Series
, which do not match the columns
, so will make the fillna failed
df.loc[:,df.columns.str.contains('Col')].apply(lambda x : x.fillna(value=df['Val']))