Search code examples
pandasscikit-learnclassificationcross-validationk-fold

Print classification result with k fold classification with sklearn package


I have a dataset that I spilt by the holdout method using sklearn. The following is the procedure

from sklearn.model_selection import train_test_split
(X_train, X_test, y_train, y_test)=train_test_split(X,y,test_size=0.3, stratify=y)

I am using Random forest as classifier. The following is the code for that

clf = RandomForestClassifier(random_state=0 )
clf.fit(X_train, y_train)
R_y_pred = clf.predict(X_test)
target_names = ['Alive', 'Dead']
print(classification_report(y_test, R_y_pred, target_names=target_names))

Now I would like to use stratified kfold cross-validation on the training set. The code that I have written for that

cv_results = cross_validate(clf, X_train, y_train, cv=5)
R_y_pred = cv_results.predict(X_test)
target_names = ['Alive', 'Dead']
print(classification_report(y_test, R_y_pred, target_names=target_names))

I got error as cv_results has no attribute like predict.

I would like to know how could I print the classification result after using k fold cross validation.

Thank you.


Solution

  • The cv_results is simply returning scores that demonstrate how well the model performs in predicting data across split samples (5 as specified in this case).

    It is not a model that can be used for prediction purposes.

    For instance, when considering a separate problem of predicting hotel cancellations using a classification model, using 5-fold cross validation with a random forest classifier yields the following test scores:

    >>> from sklearn.model_selection import cross_validate
    >>> cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
    >>> cv_results
    
    {'fit_time': array([1.09486771, 1.13821363, 1.11560798, 1.08220959, 1.06806993]),
     'score_time': array([0.07809329, 0.10946631, 0.09018588, 0.07582998, 0.07735801]),
     'test_score': array([0.84440007, 0.85172242, 0.85322017, 0.84656349, 0.84190381])}
    

    However, when attempting to make predictions using this model, the same error message is returned:

    >>> from sklearn.model_selection import cross_validate
    >>> cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
    >>> cv_results
    >>> R_y_pred = cv_results.predict(x1_val)
    >>> print(classification_report(y_test, R_y_pred))
    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    Cell In[33], line 4
          2 cv_results = cross_validate(clf, x1_train, y1_train, cv=5)
          3 cv_results
    ----> 4 R_y_pred = cv_results.predict(x1_val)
          5 print(classification_report(y_test, R_y_pred))
    
    AttributeError: 'dict' object has no attribute 'predict'