I am following this example to determine feature importance using Random Forests. When using numerous features and only a subset of these features, these are the results I observe, respectively:
Is there a particular reason the error bars increase drastically when using all possible features? Is there any significance to a negative quantity? (Note: the particular labels on the x-axis in the two plots do not necessarily correspond.)
When you are using only the most important features then there is less chance of an error happening (or less chance of the model incorrectly learn a pattern where it shouldn't).
Without using feature importances
- There is a high chance that your model is captruing patterns where it shouldn't and hence giving importance to lesser important feature where it shouldn't.
- Also, Random Forest is an ensemble of decision trees, some might capture the correct feature importances, some might not.
- The most importance ones have such a high error rate because in some trees, they may be absolutely ignored altogther or given least importance. While some might capture it correctly
- Hence, you have both ends of the spectrum resulting in such a high error rate.
Using feature importances
- You eliminate the least important features successively resulting in the fact that in successive trees, that feature will not be considered at all ( Hence lesser chance of any error happening in feature importance)
- Doing this successively improves the chances of more imporatant features to be selected again and again for splitting, hence the error margin is comparatively less