python scikit-learn random-forest training-data feature-selection

Error bars in feature selection increase when using more features?

I am following this example to determine feature importance using Random Forests. When using numerous features and only a subset of these features, these are the results I observe, respectively:

Is there a particular reason the error bars increase drastically when using all possible features? Is there any significance to a negative quantity? (Note: the particular labels on the x-axis in the two plots do not necessarily correspond.)

Solution

When you are using only the most important features then there is less chance of an error happening (or less chance of the model incorrectly learn a pattern where it shouldn't).

Without using feature importances

There is a high chance that your model is captruing patterns where it shouldn't and hence giving importance to lesser important feature where it shouldn't.
Also, Random Forest is an ensemble of decision trees, some might capture the correct feature importances, some might not.
The most importance ones have such a high error rate because in some trees, they may be absolutely ignored altogther or given least importance. While some might capture it correctly
Hence, you have both ends of the spectrum resulting in such a high error rate.

Using feature importances

You eliminate the least important features successively resulting in the fact that in successive trees, that feature will not be considered at all ( Hence lesser chance of any error happening in feature importance)
Doing this successively improves the chances of more imporatant features to be selected again and again for splitting, hence the error margin is comparatively less