Search code examples
scikit-learnmsemean-square-error

mean squared error in scikit learn RidgeCV


My question is: In sklearn, how is the cv_values_ given by RidgeCV calculated? why is it different with output from metrics.mean_squared_error?

For example,

X = [1,2,3,4,5,6,7,8,9,10]
X = np.array(X).reshape(-1,1)
y = np.array([1,3.5,4,4.9,6.1,7.2,8.1,8.9,10,11.1])
ax.plot(X, y, 'o')
ax.plot(X, X+1, '-') # help visualize

enter image description here

Say we train the Ridge model on X and y

from sklearn.linear_model import RidgeCV
from sklearn.metrics import mean_squared_error
model = RidgeCV(alphas = [0.001], store_cv_values=True).fit(X, y)

Now the output of

mean_squared_error(y_true=y, y_pred=model.predict(X))

is 0.1204000013110009, while the output of

model.cv_values_.mean()

is 0.24472577167818438.

Why is there such a huge difference? Am I missing something obvious?


Solution

  • From the official website link

    cv_values_

    Cross-validation values for each alpha (if store_cv_values=True and cv=None). After fit() has been called, this attribute will contain the mean squared errors (by default) or the values of the {loss,score}_func function (if provided in the constructor).

    In your case when you call the

    model = RidgeCV(alphas = [0.001], store_cv_values=True).fit(X, y)

    you have: cv=None

    cv=None means that you use the Leave-One-Out cross-validation.

    So cv_values stores the mean squared error for each sample using Leave-One out cross validation. In every fold you have only 1 test point and thus n = 1. So cv_values_ will give you the squared error for every point in your training data set when it was a part of the test fold.

    Finally, this means that when you call model.cv_values_.mean(), you get the mean of these individual errors (mean of each error for each point). To see these individual error you can use print(model.cv_values_)

    Individual means that the n=1 in the following equation:

    enter image description here

    On the other hand, mean_squared_error(y_true=y, y_pred=model.predict(X)) means that you put n=10 in this equation.

    So the 2 results will differ.