python data-science knn training-data test-data

K Nearest Neighbor Python

I am new to data mining I was trying to implement the KNN Classifier on separate training and testing datasets. all tutorials that I see use train_test_split method to split the data set, whereas I already have the dataset split into Train and Test. How do I assign the target variable?

Solution

I am assuming that your test data is labelled (i.e. logically divided into test_X and test_y, and you would use this to test the performance of your model which you have trained on train data.

Load train data into (train_X, train_y) and load test data into (test_X, test_y)
Train your model with train data

from sklearn.neighbors import KNeighborsClassifier
knn_clf = KNeighborsClassifier()
knn_clf.fit(train_X, train_y)

Predict on test data

y_pred = model.predict(test_X)

Check accuracy of predictions

import numpy as np
accuracy = np.mean(test_y == y_pred)