Search code examples
machine-learningdata-sciencerandom-forestensemble-learningboosting

In what scenario bagging can be used over boosting?


I am new to data science and so far i have learnt that bagging only reduces high variance but boosting reduces both variance and bias and thus increasing the accuracy for both train and test cases.

I understand the functioning of both. Seems like in terms of accuracy boosting always performs better than bagging. Please correct me if i am wrong.

Is there any parameter that makes bagging or bagging based algorithms better than boosting - be it in terms of memory or speed or complex data handling or any other parameter.


Solution

  • There are two properties of bagging that can make it more attractive than boosting:

    1. It's parallelizable - You can speed up your training procedure by 4-8x times, depending on your cpu cores, due to the embarrassingly parallel nature of bagging.
    2. Bagging is comparatively more robust to noise (paper). Real life data are rarely as clean as toy datasets we play with while learning data science. Boosting have a tendency to overfit to noise, while Bagging is comparatively better at handling noise.