python scikit-learn cross-validation k-fold

Different RMSE from cross_validate and iterating Kfolds

I want to write my own function for a cross validation as I cant use cross_validate in this situation.

Corret me if I am wrong but my cross validate code is:

cv = cross_validate(elastic.est,X,y,cv=5,scoring='neg_mean_squared_error')

output :

{'fit_time': array([3.90563273, 5.272861  , 2.19111824, 6.42427135, 5.62084389]),
 'score_time': array([0.05504966, 0.06105542, 0.0530467 , 0.06006551, 0.05603933]),
 'test_score': array([-0.00942235, -0.01220626, -0.01157624, -0.00998556, -0.01144867])}

So i have done this to calculate the RMSE.

math.sqrt(abs(cv["test_score"]).mean())

The result is always around 0.104

I've then written the below function to loop kFolds and i'm always getting a much lower RMSE score (and it runs about 10 times quicker)

def get_rmse(y_true,y_pred):    
    score = math.sqrt(((y_pred-y_true) ** 2).mean())
    return score

listval=[]

kf = KFold(n_splits=5,shuffle=True)

for train_index, test_index in kf.split(X,y):

    Xx = np.array(X)
    yy = np.array(y)

    X_train, X_test = Xx[train_index], Xx[test_index]
    y_train, y_test = yy[train_index], yy[test_index]

    elastic.est.fit(X_train,y_train)
    preds = elastic.est.predict(X_test)
    listval.append(get_rmse(y_test,preds))

np.mean(listval)

The result is 0.0729 and always lands around this value.

What am I missing? Same data, same esitmator, same amount of folds?

Solution

The difference that you observe comes from the fact, that you compute the final number differently:

for the cross_validate output you first average MSE over folds and then take square root.
for the custom implementation you first take the root and only then average values over folds.

Of course, in a general case root of mean is not equal mean of roots.