machine-learning scikit-learn classification decision-tree

Decision Tree Uniqueness sklearn

I have some questions regarding decision tree and random forest classifier.

Question 1: Is a trained Decision Tree unique?

I believe that it should be unique as it maximizes Information Gain over each split. Now if it is unique why there is random_state parameter in decision tree classifier.As it is unique so it will be reproducible every time. So no need for random_state as Decision tree is unique.

Question 2: What does a decision tree actually predict?

While going through random forest algorithm I read that it averages probability of each class from its individual tree, But as far I know decision tree predicts class not the Probability for each class.

Solution

Even without checking out the code, you will see this note in the docs:

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

For splitter='best', this is happening here:

# Draw a feature at random
f_j = rand_int(n_drawn_constants, f_i - n_found_constants,
               random_state)

And for your other question, read this:

...

Just build the tree so that the leaves contain not just a single class estimate, but also a probability estimate as well. This could be done simply by running any standard decision tree algorithm, and running a bunch of data through it and counting what portion of the time the predicted label was correct in each leaf; this is what sklearn does. These are sometimes called "probability estimation trees," and though they don't give perfect probability estimates, they can be useful. There was a bunch of work investigating them in the early '00s, sometimes with fancier approaches, but the simple one in sklearn is decent for use in forests.

...