Search code examples
pythonmachine-learningscikit-learnclassificationdecision-tree

What does clf.score(X_train,Y_train) evaluate in decision tree?


I'm trying to build a decision tree, and found the following code online.

My question is:

  • What is clf.score(X_train,Y_train) evaluate for in decision tree? The output is in the following screenshot, I'm wondering what is that value for?

    clf = DecisionTreeClassifier(max_depth=3).fit(X_train,Y_train)
    print("Training:"+str(clf.score(X_train,Y_train)))
    print("Test:"+str(clf.score(X_test,Y_test)))
    pred = clf.predict(X_train)
    

    Output:

    enter image description here

  • And in the following code, I think it calculates several scores for the model. With higher max_depth I set, the score increase. That's easy to understand for me. However, I'm wondering what the difference between these number and the value for Training and Test in the previous screenshot?

enter image description here

  • My goal is to predict house price whether it's over 20k or not. Which score I should consider when choose the best-fit and simple model?

Solution

  • As correctly pointed out in the comments, it is the mean training accuracy indeed; you should have been able to guess that already, by simply comparing the four different scores in your 2nd screenshot with the training one in your 1st. But in any case, and before proceeding to open such questions here, you should first consult the relevant documentation, which is arguably your best friend in similar cases. Quoting from the score method of the scikit-learn DecisionTreeClassifier docs:

    score (X, y, sample_weight=None)

    Returns the mean accuracy on the given test data and labels.