Search code examples
pythonpandasdataframefillna

Fill Na in pandas with averages per another column


I have a dataframe

df = pd.DataFrame({
    "species":["cat","dog","dog","cat","cat"],
    "weight":[5,4,3,7,None],
    "length":[12,None,13,14,15],
})
   species  weight  length
 0     cat     5.0    12.0
 1     dog     4.0     NaN
 2     dog     3.0    13.0
 3     cat     7.0    14.0
 4     cat     NaN    15.0

and I want to fill the missing data with the average for the species, i.e.,

df.loc[1,"length"] = 13   # the average dog length
df.loc[4,"weight"] =  6  # (5+7)/2 the average cat weight

How do I do that?

(presumably I need to pass value=DataFrame to df.fillna, but I don't see an easy way to construct the frame)


Solution

  • df.fillna(df.groupby('species').transform('mean')) which returns

      species  weight  length
    0     cat     5.0    12.0
    1     dog     4.0    13.0
    2     dog     3.0    13.0
    3     cat     7.0    14.0
    4     cat     6.0    15.0