I am given a data set with many NaN values and I wanted to fill the null value with the mean of each column. So I tried the following code:
def fill_mean():
m = [df.columns.get_loc(c) for c in df.columns if c in missing]
for i in m:
df[df.columns[i]] =df[df.columns[i]].fillna(value=df[df.columns[i]].mean())
return df
but I get this error:
TypeError: must be str, not int
The columns I'm trying to fill are all composed by the same type: which is either 'float64' or 'O'.
I suspect the problem derives from this fact, but how can I solve it?
Edit: I created a dictionary containing the column which contains the index of the columns where some data are missing and each column's type.
di = dict(zip(missing, m2))
def fill_mean():
m = [df.columns.get_loc(c) for c in df.columns if c in missing]
for i in m:
if di[m] == "dtype('float64')":
df[df.columns[i]] = df[df.columns[i]].fillna(value=df[df.columns[i]].mean())
return df
If I run fill_mean(), now I get a different error:
if di[m] == "dtype('float64')":
TypeError: unhashable type: 'list'
I think you want to first cast your columns as type float
, then use df.fillna
, using df.mean()
as the value
argument:
df[["columns", "to", "change"]] = df[["columns", "to", "change"]].astype('float')
df.fillna(df.mean())
Note: If all your columns in your dataframe can be cast to float
, then you can simply do:
df = df.astype('float').fillna(df.astype('float').mean())
Example:
df = pd.DataFrame({'col1':np.random.choice([np.nan, '1','2'], 10),
'col2':np.random.choice([np.nan, '1', '2'], 10)})
>>> print(df)
col1 col2
0 2 1
1 2 1
2 nan nan
3 1 2
4 1 2
5 nan 2
6 2 2
7 2 2
8 1 2
9 nan 1
df[['col1', 'col2']] = df[['col1', 'col2']].astype('float')
df = df.fillna(df.mean())
>>> print(df)
col1 col2
0 2.000000 1.000000
1 2.000000 1.000000
2 1.571429 1.666667
3 1.000000 2.000000
4 1.000000 2.000000
5 1.571429 2.000000
6 2.000000 2.000000
7 2.000000 2.000000
8 1.000000 2.000000
9 1.571429 1.000000