Search code examples
pythondata-scienceknntraining-datatest-data

K Nearest Neighbor Python


I am new to data mining I was trying to implement the KNN Classifier on separate training and testing datasets. all tutorials that I see use train_test_split method to split the data set, whereas I already have the dataset split into Train and Test. How do I assign the target variable?


Solution

  • I am assuming that your test data is labelled (i.e. logically divided into test_X and test_y, and you would use this to test the performance of your model which you have trained on train data.

    1. Load train data into (train_X, train_y) and load test data into (test_X, test_y)

    2. Train your model with train data

    from sklearn.neighbors import KNeighborsClassifier
    knn_clf = KNeighborsClassifier()
    knn_clf.fit(train_X, train_y)
    
    1. Predict on test data
    y_pred = model.predict(test_X)
    
    1. Check accuracy of predictions
    import numpy as np
    accuracy = np.mean(test_y == y_pred)