I'm new to this machine learning and using this boston dataset for predictions. Everything except the result for precision_score and accuracy_score is working fine . This is what i have done :
import pandas as pd
import sklearn
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing,cross_validation, svm
from sklearn.datasets import load_boston
import numpy as np
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
boston = load_boston()
df = pd.DataFrame(boston.data)
df.columns= boston.feature_names
df['Price']= boston.target
X = np.array(df.drop(['Price'],axis=1), dtype=np.float64)
X = preprocessing.scale(X)
y = np.array(df['Price'], dtype=np.float64)
print (len(X[:,6:7]),len(y))
X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,y,test_size=0.30)
clf =LinearRegression()
clf.fit(X_train,y_train)
y_predict = clf.predict(X_test)
print(y_predict,len(y_predict))
print (accuracy_score(y_test, y_predict))
print(precision_score(y_test, y_predict,average = 'macro'))
Now i get the following error:
File "LinearRegression.py", line 33, in
accuracy = accuracy_score(y_test, y_predict) File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py",
line 172, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 89, in _check_targets
raise ValueError("{0} is not supported".format(y_type)) ValueError: continuous is not supported
You are using a linear Regression model as
clf = LinearRegression()
which predicts continuous values. eg: 1.2, 1.3
Whereas accuracy_score(y_test, y_predict)
expects boolean values. 1 or 0 (true or false) or categorical values like 1,2,3,4 etc.. Where the numbers act as categories.
That's why you are getting an error.
How to solve this?
Since you are trying to predict Price
on boston data which is a continuous value. I recommend you change your error measure from accuracy to RMSE or MSE
Replace:
print(accuracy_score(y_test, y_predict))
with:
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test, y_predict))
That will solve your problem.