Search code examples
pythonscikit-learncross-validation

Using "cross_val_predict" with "RepeatedStratifiedKFold" throws error


I am trying to plot ROC AUC curve. I got the scores as given below:

# bagged decision trees on an imbalanced classification problem
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import BaggingClassifier
# generate dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
    n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=4)
# define model
model = BaggingClassifier()
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)
# summarize performance
print('Mean ROC AUC: %.3f' % mean(scores))

But when I trying to call (for plotting ROC AUC curve) as below:

scores2 = cross_val_predict(model, X, y, cv=cv,method='predict_proba')

I am getting error as

ValueError: cross_val_predict only works for partitions

May I know, how can I modify the code for plotting the curve?

I found some what similar problem in another stackoverflow question. But it is not answered yet.


Solution

  • Use cross_val_score(RFmodel, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise') because it does not fit, it only returns the y_predicted values.