I have the following code
tree = DecisionTreeClassifier(max_depth=4, random_state=0)
trainPrediction=tree.predict(trainData)
score=cross_val_score(tree, trainData, trainPrediction)
With the code above, I get a score that looks like this:
[0.96052632 0.93421053 0.89473684 0.94736842 0.92 ]
I was expecting just a single number as the score, and not an array. How do I read this code, which one would be considered the score?
Some other classifiers I tried (like SVM), have the score(...)
function, which worked well. Decisiontree classifier also seems to have this function, but I get an error when I try to use it like:
trainScore=score(trainData, trainPrediction)
The error I get is: TypeError: 'numpy.float64' object is not callable
Documentation shows this score(X, y[, sample_weight])
But I don't really understand this I guess
The reason why I used cross_val_score(...)
is cus they use it in the documentation of DecisionTree:
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
Note
I also tried using: accuracy_score(...) like in this example:
Accuracy score of a Decision Tree Classifier
But this does not work, as this function is not part of this classifier
sklearn.model_selection.cross_val_score
gives you the score evaluated by cross validation, which means that it uses K-fold cross validation to fit and predict using the input data. The result is hence an array of k
scores, resulting from each of the folds. You have an array of 5
values because cv
defaults to that value, but you can modify it to others.
Here's an example using the iris dataset:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
tree = DecisionTreeClassifier(max_depth=4, random_state=0)
cls = tree.fit(X_train, y_train)
y_pred = cls.predict(X_test)
Now with the default settings:
score = cross_val_score(cls, X_test, y_test)
score
# array([1., 1., 1., 1., 1.])
Or for three folds:
score = cross_val_score(cls, X_test, y_test, cv=3)
score
# array([1., 1., 1.])
Also note that cross_val_score
expects X
and the target variable to try to predict, not the predicted value. Hence you should be feeding it X_test
and y_test
.