I am trying to implement a KNN model, using Mahalanobis as the distance metric, however when I execute the code I am getting an error:
Value Error: "size of V does not match
where V is the covariance matrix of features.
Relevant parts of my code below:
X_train, X_test, y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=10,stratify=y)
knn2=KNeighborsClassifier(n_neighbors=20, metric='mahalanobis', metric_params={'V': np.cov(X_train)})
knn2.fit(X_train,y_train) # this is the line that causes the error.
I have looked at the repo on github for sklearn's distance metric code (from line 628 is Mahalanobis), and can see the error arises from the following:
cdef inline DTYPE_t rdist(self, DTYPE_t* x1, DTYPE_t* x2,
ITYPE_t size) nogil except -1:
if size != self.size:
with gil:
raise ValueError('Mahalanobis dist: size of V does not match')
I've worked out what self.size
is in my case, but can't work out what size
is.
Could anyone help with this error?
Thanks
Pass the argument rowvar=False to np.cov and it should work. Your knn constructor should look like this:
knn2=KNeighborsClassifier(n_neighbors=20, metric='mahalanobis', metric_params={'V': np.cov(X_train, rowvar=False)})