I want to impute values to replace NaNs in a dataframe. The dataframe contains Hour, Location and multiple numeric columns associated with speeds, times etc. In each of the numeric columns containing NaNs, I want to replace the missing value with the average value for the hour and locationcode for that column.
EG The duration in Row 2 would be given the average value of the hour 0, Location BG values (3.15,3.18, 3.22) etc
Hour | Location | Duration |
---|---|---|
0 | BG | 3.15 |
0 | BG | NaN |
0 | WH | 3.12 |
0 | BG | 3.18 |
0 | BG | 3.22 |
1 | WH | 43.1 |
1 | WH | NaN |
1 | BG | 42.5 |
1 | WH | 38.4 |
1 | BG | 37.8 |
I assign
df_mean = df.groupby(['Hour','Location']).mean(numeric_only=True)
with the intention of using this dataframe of means to be a source for the replacment vlue but haven't found any success in solving my issue.
df['Duration'] = df['Duration'].fillna(
df.groupby(['Hour', 'Location'])['Duration'].transform('mean')
)