pandas scikit-learn classification cross-validation k-fold

Print classification result with k fold classification with sklearn package

I have a dataset that I spilt by the holdout method using sklearn. The following is the procedure

from sklearn.model_selection import train_test_split
(X_train, X_test, y_train, y_test)=train_test_split(X,y,test_size=0.3, stratify=y)

I am using Random forest as classifier. The following is the code for that

clf = RandomForestClassifier(random_state=0 )
clf.fit(X_train, y_train)
R_y_pred = clf.predict(X_test)
target_names = ['Alive', 'Dead']
print(classification_report(y_test, R_y_pred, target_names=target_names))

Now I would like to use stratified kfold cross-validation on the training set. The code that I have written for that

cv_results = cross_validate(clf, X_train, y_train, cv=5)
R_y_pred = cv_results.predict(X_test)
target_names = ['Alive', 'Dead']
print(classification_report(y_test, R_y_pred, target_names=target_names))

I got error as cv_results has no attribute like predict.

I would like to know how could I print the classification result after using k fold cross validation.

Thank you.

Solution

The cv_results is simply returning scores that demonstrate how well the model performs in predicting data across split samples (5 as specified in this case).

It is not a model that can be used for prediction purposes.

For instance, when considering a separate problem of predicting hotel cancellations using a classification model, using 5-fold cross validation with a random forest classifier yields the following test scores:

>>> from sklearn.model_selection import cross_validate
>>> cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
>>> cv_results

{'fit_time': array([1.09486771, 1.13821363, 1.11560798, 1.08220959, 1.06806993]),
 'score_time': array([0.07809329, 0.10946631, 0.09018588, 0.07582998, 0.07735801]),
 'test_score': array([0.84440007, 0.85172242, 0.85322017, 0.84656349, 0.84190381])}

However, when attempting to make predictions using this model, the same error message is returned:

>>> from sklearn.model_selection import cross_validate
>>> cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
>>> cv_results
>>> R_y_pred = cv_results.predict(x1_val)
>>> print(classification_report(y_test, R_y_pred))

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[33], line 4
      2 cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
      3 cv_results
----> 4 R_y_pred = cv_results.predict(x1_val)
      5 print(classification_report(y_test, R_y_pred))

AttributeError: 'dict' object has no attribute 'predict'