I'm fairly new to programming and I have this problem with KNNImputer where it gives me the same values (mean value) in all the NaN spaces. I'm trying to do the hot-deck imputation using this function.
kopia_brak8_1 = dane_brak8.copy()
imputerHD = KNNImputer(missing_values=np.NaN, n_neighbors=3)
kopia_brak8_1['alcohol'] = imputerHD.fit_transform(kopia_brak8_1['alcohol'].values.reshape(-1, 1))
print("HOT-DECK\n")
print(kopia_brak8_1)
Output
If anybody knows what I'm doing wrong, I'd appreciate help :)
I tried searching on the Internet but I don't see anyone having the same problem. I think it has to do with the way I'm refering to the 'alcohol' column and that's why it keeps giving me the same value but fairly I have no idea how to change that.
You're running into this issue because you've only supplied a single column to the KNN Imputer. KNN needs to build a geometric neighborhood, so it needs more information. With one column called out like this, it can only perform a simple imputer calculation (like mean)...it won't perform a local weighted mean. You'll need to feed it more adjacent data.
One approach is to reset the index of your column, then just pull out the second column after the imputer runs:
kopia_brak8_1 = dane_brak8.copy()
imputerHD = KNNImputer(missing_values=np.NaN, n_neighbors=3)
impute_result = imputerHD.fit_transform(kopia_brak8_1['alcohol'].reset_index())
kopia_brak8_1['alcohol'] = [i[1] for i in impute_result]