I want to replace null values with df[col].mean()
when df[col]
is not all null values.
I implement code like below:
if train_x[cols].isna().sum() == len(train_x): # need to fix
train_x.loc[:, cols] = train_x[cols].fillna(value=0.0)
else:
train_x.loc[:, cols] = train_x[cols].fillna(value=train_x[cols].mean())
This code has error, because train_x[cols]
is a dataframe, but I need to put each column under condition.
Is there a better way to implement my purpose?
Sorry for my poor English skills.
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{"col1": [1, 9, pd.NA], "col2": [pd.NA, pd.NA, pd.NA], "col3": [8, 4, 3]}
)
print(df)
# Output
col1 col2 col3
0 1 <NA> 8
1 9 <NA> 4
2 <NA> <NA> 3
Here is one way to do it:
for col in df.columns:
if df[col].isna().sum() == df.shape[0]:
df[col] = 0
else:
df[col] = df[col].fillna(df[col].mean())
Then:
print(df)
# Output
col1 col2 col3
0 1.0 0 8
1 9.0 0 4
2 5.0 0 3