I want to write my own function for a cross validation as I cant use cross_validate in this situation.
Corret me if I am wrong but my cross validate code is:
cv = cross_validate(elastic.est,X,y,cv=5,scoring='neg_mean_squared_error')
output :
{'fit_time': array([3.90563273, 5.272861 , 2.19111824, 6.42427135, 5.62084389]),
'score_time': array([0.05504966, 0.06105542, 0.0530467 , 0.06006551, 0.05603933]),
'test_score': array([-0.00942235, -0.01220626, -0.01157624, -0.00998556, -0.01144867])}
So i have done this to calculate the RMSE.
math.sqrt(abs(cv["test_score"]).mean())
The result is always around 0.104
I've then written the below function to loop kFolds and i'm always getting a much lower RMSE score (and it runs about 10 times quicker)
def get_rmse(y_true,y_pred):
score = math.sqrt(((y_pred-y_true) ** 2).mean())
return score
listval=[]
kf = KFold(n_splits=5,shuffle=True)
for train_index, test_index in kf.split(X,y):
Xx = np.array(X)
yy = np.array(y)
X_train, X_test = Xx[train_index], Xx[test_index]
y_train, y_test = yy[train_index], yy[test_index]
elastic.est.fit(X_train,y_train)
preds = elastic.est.predict(X_test)
listval.append(get_rmse(y_test,preds))
np.mean(listval)
The result is 0.0729 and always lands around this value.
What am I missing? Same data, same esitmator, same amount of folds?
The difference that you observe comes from the fact, that you compute the final number differently:
cross_validate
output you first average MSE over folds and then take square root.Of course, in a general case root of mean is not equal mean of roots.