Random forests train multiple CARTs on bootstrapped samples of training data. Few places say that the bootstrapped sample has a subset of original features (like this) and few of the places say that the bootstrapped sample has all the original features and the feature samplign happens at each node from a set of all the features in the original training data. Most of the resources don't touch this point and are mostly a copy paste form each other
Can you tell which of the following two is part of the random forest algorithm
Lets us say my original set of features is S.
Which one (1 or 2) is it?
Both, the documentations of scikit-learn
's RandomForestClassifier
(link) as well as RandomForestRegressor
(link) refer to Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.
Breiman writes
“… random forest with random features is formed by selecting at random, at each node, a small group of input variables to split on.”
So it is the first of your choices.
Have a look at this thread for the background: In Random Forest, why is a random subset of features chosen at the node level rather than at the tree level?