python scikit-learn regression linear-regression train-test-split

What's a good R-squared score?

I ran this Linear Regression code and I got the R-squared score using the .score() method. However, the score is not easily understandable as the score can go into negative numbers. The code can be run on your local file system if sklearn is installed.

Code:

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import numpy as np
boston = load_boston()
X = boston.data
y = boston['target']
X_roomns = X[:,5]
X_train, X_test, y_train, y_test = train_test_split(X_rooms, y)
reg = LinearRegression()
reg.fit(X_train.reshape(-1,1), y_train)
prediction_space = np.linspace(min(X_rooms), max(X_rooms)).reshape(-1,1)
plt.scatter(X_test, y_test)
plt.plot(prediction_space, reg.predict(prediction_space), color = 'black')
reg.score(X_test.reshape(-1,1), y_test)

Thanks!

Solution

A r-squared score below 0 (on out-of-sample data, for in-sample data it's impossible) is always bad, but otherwise it really depends on the use case. Is it more useful for comparing different model then for determining if the model is "good enough". Evaluating if the model is good enough depends on how useful the inferred information is and again this really depends on the use case.