I have a dataset that I spilt by the holdout method using sklearn. The following is the procedure
from sklearn.model_selection import train_test_split
(X_train, X_test, y_train, y_test)=train_test_split(X,y,test_size=0.3, stratify=y)
I am using Random forest as classifier. The following is the code for that
clf = RandomForestClassifier(random_state=0 )
clf.fit(X_train, y_train)
R_y_pred = clf.predict(X_test)
target_names = ['Alive', 'Dead']
print(classification_report(y_test, R_y_pred, target_names=target_names))
Now I would like to use stratified kfold cross-validation on the training set. The code that I have written for that
cv_results = cross_validate(clf, X_train, y_train, cv=5)
R_y_pred = cv_results.predict(X_test)
target_names = ['Alive', 'Dead']
print(classification_report(y_test, R_y_pred, target_names=target_names))
I got error as cv_results has no attribute like predict.
I would like to know how could I print the classification result after using k fold cross validation.
Thank you.
The cv_results is simply returning scores that demonstrate how well the model performs in predicting data across split samples (5 as specified in this case).
It is not a model that can be used for prediction purposes.
For instance, when considering a separate problem of predicting hotel cancellations using a classification model, using 5-fold cross validation with a random forest classifier yields the following test scores:
>>> from sklearn.model_selection import cross_validate
>>> cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
>>> cv_results
{'fit_time': array([1.09486771, 1.13821363, 1.11560798, 1.08220959, 1.06806993]),
'score_time': array([0.07809329, 0.10946631, 0.09018588, 0.07582998, 0.07735801]),
'test_score': array([0.84440007, 0.85172242, 0.85322017, 0.84656349, 0.84190381])}
However, when attempting to make predictions using this model, the same error message is returned:
>>> from sklearn.model_selection import cross_validate
>>> cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
>>> cv_results
>>> R_y_pred = cv_results.predict(x1_val)
>>> print(classification_report(y_test, R_y_pred))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[33], line 4
2 cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
3 cv_results
----> 4 R_y_pred = cv_results.predict(x1_val)
5 print(classification_report(y_test, R_y_pred))
AttributeError: 'dict' object has no attribute 'predict'