Search code examples
pythonmachine-learningscikit-learnlinear-regression

Compare predicted values with true values


I've split my data into X_train, X_test, y_train, and y_test, and is trying to compare y_pred (my model's prediction of the X_test set) with the ground truth values, y_test.

# Splitting data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
model = LinearRegression().fit(X_train,y_train)

# Trying to predict X_test with the model            
y_pred = model.predict(X_test)

# How do I compare y_pred with the y_test? 
print(model.score(X_test,y_test))

What do I put as parameters in the model.score( , ) to compare y_pred with the y_test?

Do I print the score of X_test and y_test? My code just doesn't seem right.


Solution

  • LinearRegression.score works in the way you called it: you pass in an X and a corresponding y, which it scores against a prediction it does not share with you.

    Accordingly,I recommend not using model.score(), because it's a bit of an opaque function. You never get to see the prediction, and you don't know what the metric is without referring to the docs (it depends on the model; in this case it's R2).

    Better to make a prediction, import the metric you want, and compute it explicitly. For example to use mean squared error:

    from sklearn.metrics import mean_squared_error
    
    y_pred = model.predict(X_test)
    
    mean_square_error(y_pred, y_test)