I just got into learning about Decision Trees. So the questions might be a bit silly.
The idea of selecting the root node is a bit confusing. Why can't we randomly select the root node? The only difference it seems to make is that it would make the Decision Tree longer and complex, but would get the same result eventually.
Also just as an extension of the feature selection process in Decision Trees, why can't be use something as simple as correlation between features and target, or Chi-Square test to figure which Feature to start off with?
Why can't we randomly select the root node?
We can, but this could also be extended to its child node and to child node of that child node and so on...
The only difference it seems to make is that it would make the Decision Tree longer and complex, but would get the same result eventually.
The more complex the tree is the higher variance it will have, meaning 2 things:
None of these is good and even if you pick a sensible choice at each step, based on entropy or gini impurity index, you will still probably end up with larger three than you would like. Yes that tree might have a good accuracy on the training set but it will probably overfit the training set.
Most of the algorithms that are using decision trees have their own ways to combat this variance, in one way or another. If you consider simple decision tree algorithm itself, the way to reduce the variance is to first train the tree and prune the tree afterwards to make it smaller and less overfitting. Random forest is solving it by averaging over large number of trees while randomly restricting which predictor can be considered for slit every time that decision has to be made.
So, randomly picking the root node will lead to the same result eventually but only on the training set and only once the overfitting is so extreme that the tree simply predicts everything with 100% accuracy. But the more the tree overfits the training set, the less accuracy it will have on a test set (in general), and we care about accuracy on the test set, not on the training set.