Search code examples
pythonmachine-learningscikit-learnartificial-intelligence

sklearn: Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample


Hey there I'm using Label Encoder and Onehotencoder in my machine learning project sample but an error appeared while executing the code at the part where Onehotencoder executed and the error was Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. and my feature column has only two attributes Negative or Positive.

What does this error message mean and how do I fix it

#read data set from excel 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.read_csv('diab.csv')
feature=dataset.iloc[:,:-1].values
lablel=dataset.iloc[:,-1].values

#convert string data to binary 
#transform sting data in lablel column to decimal/binary 0 /1
from sklearn.preprocessing import LabelEncoder,OneHotEncoder

lab=LabelEncoder()
lablel=lab.fit_transform(lablel)
onehotencoder=OneHotEncoder()
lablel=onehotencoder.fit_transform(lablel).toarray()



#create trainning model and test it 
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(feature,lablel,test_size=0.30)



#fitting SVM to trainnong set 
from sklearn.svm import SVC
classifier=SVC(kernel='linear',random_state=0)
classifier.fit(x_train,y_train)

y_pred=classifier.predict(x_test)


#making the confusion matrix 
from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test, y_pred)

from sklearn.neighbors import KNeighborsClassifier

my_classifier=KNeighborsClassifier()

my_classifier.fit(x_train,y_train)
prediction=my_classifier.predict(x_test)

print(prediction)


from sklearn.metrics import accuracy_score
print (accuracy_score(y_test,prediction))

plot=plt.plot((prediction), 'b', label='GreenDots')
plt.show()

Solution

  • I suspect the issue is that you have 2 possible labels and are treating them as separate values. The output of an SVM is usually a single value, so your labels need to be a single value for each sample. Instead of mapping the labels to one hot vectors, instead just use a single value of 1 when the label is positive and a value of 0 when the label is negative.