I'm trying to build a decision tree, and found the following code online.
My question is:
What is clf.score(X_train,Y_train) evaluate for in decision tree? The output is in the following screenshot, I'm wondering what is that value for?
clf = DecisionTreeClassifier(max_depth=3).fit(X_train,Y_train)
print("Training:"+str(clf.score(X_train,Y_train)))
print("Test:"+str(clf.score(X_test,Y_test)))
pred = clf.predict(X_train)
Output:
And in the following code, I think it calculates several scores for the model. With higher max_depth I set, the score increase. That's easy to understand for me. However, I'm wondering what the difference between these number and the value for Training and Test in the previous screenshot?
As correctly pointed out in the comments, it is the mean training accuracy indeed; you should have been able to guess that already, by simply comparing the four different scores in your 2nd screenshot with the training one in your 1st. But in any case, and before proceeding to open such questions here, you should first consult the relevant documentation, which is arguably your best friend in similar cases. Quoting from the score
method of the scikit-learn DecisionTreeClassifier
docs:
score (X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.