I am experimenting with the way the weights on the distance affect the performance of the kNN algorithm and for a reproducible example I am working with the iris dataset.
To my surprise, weighting 2 predictors 100 times more than the rest 2 predictors generate identical predictions with the unweighted model. What is this rather counterintuitive finding?
My code is the following:
X_original = iris['data']
Y = iris['target']
sc = StandardScaler() # Defines the parameters of the Scaler
X = sc.fit_transform(X_original) # Transforms the original data to standardized data and returns them
from sklearn.model_selection import StratifiedShuffleSplit
sss = StratifiedShuffleSplit(n_splits = 1, train_size = 0.8, test_size = 0.2)
split = sss.split(X, Y)
s = list(split)
train_index = s[0][0]
test_index = s[0][1]
X_train = X[train_index, ]
X_test = X[test_index, ]
Y_train = Y[train_index]
Y_test = Y[test_index]
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 6)
iris_fit = knn.fit(X_train, Y_train) # The data can be passed as numpy arrays or pandas dataframes/series.
# All the data should be numeric
# There should be no NaNs
predictions_w1 = knn.predict(X_test)
weights = np.array([1, 1, 100, 100])
weights =weights/np.sum(weights)
knn_w = KNeighborsClassifier(n_neighbors = 6, metric='wminkowski', p=2,
metric_params={'w': weights})
iris_fit_w = knn_w.fit(X_train, Y_train) # The data can be passed as numpy arrays or pandas dataframes/series.
# All the data should be numeric
# There should be no NaNs
predictions_w100 = knn_w.predict(X_test)
(predictions_w1 != predictions_w100).sum()
0
They are not always the same, add a random state to your train test split and you will see how it changes for different values.
StratifiedShuffleSplit(n_splits = 1, train_size = 0.8, test_size = 0.2, random_state=3)
Additionally, the weighted Minkowski distance with such extreme weights on 3rd (petal length) and 4th (petal width) feature basically gives you the same results as if you only ran KNN on these 2 features with unweighted Minkowski. And since they seem to be quite informative then it is no surprise you get very similar results compared to the case of considering all 4 features. See the wiki picture below