python machine-learning scikit-learn linear-regression

Precision_score and accuracy_score showing value error

I'm new to this machine learning and using this boston dataset for predictions. Everything except the result for precision_score and accuracy_score is working fine . This is what i have done :

import pandas as pd 
import sklearn 
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing,cross_validation, svm
from sklearn.datasets import load_boston
import numpy as np
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

boston = load_boston()
df = pd.DataFrame(boston.data)
df.columns= boston.feature_names
df['Price']= boston.target

X = np.array(df.drop(['Price'],axis=1), dtype=np.float64)
X = preprocessing.scale(X)

y = np.array(df['Price'], dtype=np.float64)

print (len(X[:,6:7]),len(y))

X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,y,test_size=0.30)

clf =LinearRegression()
clf.fit(X_train,y_train)
y_predict = clf.predict(X_test)

print(y_predict,len(y_predict))
print (accuracy_score(y_test, y_predict))
print(precision_score(y_test, y_predict,average = 'macro'))

Now i get the following error:

File "LinearRegression.py", line 33, in
 accuracy = accuracy_score(y_test, y_predict)    File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py",
line 172, in accuracy_score
 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 89, in _check_targets
 raise ValueError("{0} is not supported".format(y_type))

 ValueError: continuous is not supported

Solution

You are using a linear Regression model as

clf = LinearRegression()

which predicts continuous values. eg: 1.2, 1.3

Whereas accuracy_score(y_test, y_predict) expects boolean values. 1 or 0 (true or false) or categorical values like 1,2,3,4 etc.. Where the numbers act as categories.

That's why you are getting an error.

How to solve this?

Since you are trying to predict Price on boston data which is a continuous value. I recommend you change your error measure from accuracy to RMSE or MSE

Replace:

print(accuracy_score(y_test, y_predict))

with:

from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test, y_predict))

That will solve your problem.