Search code examples
pythonscikit-learnknngridsearchcvmahalanobis

Defining distance parameter (V) in knn crossval grid search (seuclidean/mahalanobis distance metrics)


I am trying to carry out a k-fold cross-validation grid search using the KNN algorithm using python sklearn, with parameters in the search being number of neighbors K and distance metric. I am including mahalanobis and seuclidean as distance metrics, and understand these have a parameter which needs to be specified, namely V or VI (covariance matrix of features or inverse of this).

Below is my code:

X_train, X_test, y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=10,stratify=y)

knn=KNeighborsClassifier()

grid_param={'n_neighbors':np.arange(1,51),'metric':['euclidean','minkowski','mahalanobis','seuclidean'],'metric_params':[{'V': np.cov(X_train)}]} 

knn_gscv=GridSearchCV(knn,grid_param,cv=5)

knn_gscv.fit(X_train,y_train) (*)

The (*) line throws this error when executed:

TypeError: __init__() got an unexpected keyword argument 'V'

I have also tried VI instead of V but getting same error.

I have come across potential solutions below but these don't help.

https://github.com/scikit-learn/scikit-learn/issues/6915

Scikit-learn: How do we define a distance metric's parameter for grid search

Any help appreciated!

This is also my first question, so any feedback would be helpful also with this regard.


Solution

  • grid_params = [
        {'n_neighbors': np.arange(1, 51), 'metric': ['euclidean', 'minkowski']},
        {'n_neighbors': np.arange(1, 51), 'metric': ['mahalanobis', 'seuclidean'],
         'metric_params': [{'V': np.cov(X_train)}]}
    ]
    

    The issue is that euclidean and minkowski metrics do not accepts V parameter. So you need to separate them.