Search code examples
pythonpandasscikit-learnknn

KNN Classifier Python


I am currently using the scikit learn module in order to help with a crime prediction problem. I am having an issue batch coding the entire Dataframe that I have with the knn.predict method.

How can I batch code the entire two columns of my Dataframe with the knn.predict() method in order to store in another Dataframe the output?

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

knn_df = pd.read_csv("/Users/helenapunset/Desktop/knn_dataframe.csv")

# x is the set of features 
x = knn_df[['latitude', 'longitude']]

# y is the target variable 
y = knn_df['Class']

# train and test data 
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=0)

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5)

# training the data 
knn.fit(x_train,y_train)

# test score was approximately 69% 
knn.score(x_test,y_test)

# this is predicted to be a safe zone 
crime_prediction = knn.predict([[25.787882, -80.358427]])
print(crime_prediction)

In the last line of the code I was able to add the two features I am using which are latitude and longitude from my Dataframe labeled knn_df. But, this is a single point I have been searching through the documentation on a process for streamlining this knn prediction for the entire Dataframe and cannot seem to find a way to do this. Is there somehow a possibility of using a for loop for this?


Solution

  • Let the new set to be predicted is 'knn_df_predict'. Assuming same column names,try the following lines of code :

    x_new = knn_df_predict[['latitude', 'longitude']] #formating features
    crime_prediction = knn.predict(x_new) #predicting for the new set
    knn_df_predict['prediction'] = crime_prediction #Adding the prediction to dataframe