Search code examples
pythonpandasdataframegroup-by

Filling NaN in dataframe column based on groupby dataframe value


I want to impute values to replace NaNs in a dataframe. The dataframe contains Hour, Location and multiple numeric columns associated with speeds, times etc. In each of the numeric columns containing NaNs, I want to replace the missing value with the average value for the hour and locationcode for that column.

EG The duration in Row 2 would be given the average value of the hour 0, Location BG values (3.15,3.18, 3.22) etc

Hour Location Duration
0 BG 3.15
0 BG NaN
0 WH 3.12
0 BG 3.18
0 BG 3.22
1 WH 43.1
1 WH NaN
1 BG 42.5
1 WH 38.4
1 BG 37.8

I assign

df_mean = df.groupby(['Hour','Location']).mean(numeric_only=True)

with the intention of using this dataframe of means to be a source for the replacment vlue but haven't found any success in solving my issue.


Solution

  • df['Duration'] = df['Duration'].fillna(
        df.groupby(['Hour', 'Location'])['Duration'].transform('mean')
    )