I'm using dataset from Kaggle - Cardiovascular Disease Dataset. The model has been trained and what I want to do is to label a single input(a row of 13 values) inserted in dynamic way.
Shape of Dataset is 13 Features + 1 Target, 66k rows
#prepare dataset for train and test
dfCardio = load_csv("cleanCardio.csv")
y = dfCardio['cardio']
x = dfCardio.drop('cardio',axis = 1, inplace=False)
model = knn = KNeighborsClassifier()
x_train,x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,random_state=42)
model.fit(x_train, y_train)
# make predictions for test data
y_pred = model.predict(x_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
ML is trained, what I want to do is to predict the label of this single row :
to return 0 or 1 for Target. So I wrote this code :
import numpy as np
import pandas as pd
single = np.array(['69','1','151','22','37','0','65','140','90','2','1','0','0','1'])
singledf = pd.DataFrame(single)
prediction = model.predict(final)
but it gives error : query data dimension must match training data dimension
how can I fix the labeling for single row ? why I'm not able to predict a single case ?
Each instance in your dataset has 13 features and 1 label.
x = dfCardio.drop('cardio',axis = 1, inplace=False)
This line in the code removes what I assume is the label column from the data, leaving only the (13) feature columns.
The feature vector on which you are trying to predict, is 14 elements long. You can only predict on feature vectors that are 13 elements long because that is what the model was trained on.