Search code examples
pythonmachine-learningdecision-tree

Why does cross_val_score return several scores?


I have the following code

tree = DecisionTreeClassifier(max_depth=4, random_state=0)
trainPrediction=tree.predict(trainData)
score=cross_val_score(tree, trainData, trainPrediction)

With the code above, I get a score that looks like this:

[0.96052632 0.93421053 0.89473684 0.94736842 0.92      ]

I was expecting just a single number as the score, and not an array. How do I read this code, which one would be considered the score?

Some other classifiers I tried (like SVM), have the score(...) function, which worked well. Decisiontree classifier also seems to have this function, but I get an error when I try to use it like:

trainScore=score(trainData, trainPrediction)

The error I get is: TypeError: 'numpy.float64' object is not callable

Documentation shows this score(X, y[, sample_weight]) But I don't really understand this I guess

The reason why I used cross_val_score(...) is cus they use it in the documentation of DecisionTree: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

Note

I also tried using: accuracy_score(...) like in this example:

Accuracy score of a Decision Tree Classifier

But this does not work, as this function is not part of this classifier


Solution

  • sklearn.model_selection.cross_val_score gives you the score evaluated by cross validation, which means that it uses K-fold cross validation to fit and predict using the input data. The result is hence an array of k scores, resulting from each of the folds. You have an array of 5 values because cv defaults to that value, but you can modify it to others.

    Here's an example using the iris dataset:

    from sklearn.tree import DecisionTreeClassifier
    from sklearn.model_selection import train_test_split, cross_val_score
    from sklearn.datasets import load_iris
    
    X, y = load_iris(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    
    tree = DecisionTreeClassifier(max_depth=4, random_state=0)
    cls = tree.fit(X_train, y_train)
    y_pred = cls.predict(X_test)
    

    Now with the default settings:

    score = cross_val_score(cls, X_test, y_test)
    score
    # array([1., 1., 1., 1., 1.])
    

    Or for three folds:

    score = cross_val_score(cls, X_test, y_test, cv=3)
    score
    # array([1., 1., 1.])
    

    Also note that cross_val_score expects X and the target variable to try to predict, not the predicted value. Hence you should be feeding it X_test and y_test.