Search code examples
scikit-learnrandom-forestdecision-tree

what decision tree algorithms is used for Random forest classifier in scikit-learn


As in title i was wondering where i can check which decision tree algorithms is used by RandomForestClassifier in scikit-learn. It says in attributes base_estimator_ = DecisionTreeClassifier, then behind DecisionTreeClassifier in scikitlearn is CART so is it my answer?

link to scikit-learn RandomForest

Any suggestions would be appreciated


Solution

  • Scikit-learn uses an optimized version of CART by default (https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart).

    It constructs trees by 'using the feature and threshold that yield the largest information gain'. The function to measure the quality of the split (a.k.a. largest information gain) in the trees can be set using the criterion parameter in the RandomForestClassifier.

    The default function is the gini impurity, but you can also select entropy. In practice these two are quite similar, but you can find more information here: https://datascience.stackexchange.com/questions/10228/when-should-i-use-gini-impurity-as-opposed-to-information-gain-entropy