Search code examples
machine-learningstackedensemble-learning

Stacking, Stacked generalization Algorithm


I'm trying to understand how stacking works, but so far I'm not sure if I understand it correctly. So here is what I understand so far:

  1. we train each of the k base learner (level-0) with the complete data set.

  2. we let each of the k base learner predict the whole data set.

  3. we create a new data set from all the predictions of the k base learner. The new data set looks like our original data set + the predictions of each base learner.

  4. this data set is used to train the meta learner (level-1).

My Questions:

  1. Is this so far correct?
  2. I often read that cross validation is somehow used for stacking, but I could not figure out, how is it used? Or is it even an essential part, that I'm missing?

Many thanks


Solution

  • Your understanding is mostly correct! On

    The new data set looks like our original data set + the predictions of each base learner.

    One could use original features + predictions of each base learner, but when people talking about stacking they are usually using just the predictions of each base learner.

    I often read that cross validation is somehow used for stacking, but i could not figure out, how it is used? Or is it even an essential part, that I'm missing?

    Yes, cross validation is often used with stacking. What happens is when you do it as you described, the meta model (level-1 as you call it) can over-fit from the predictions the base models made, as each prediction is being made having seen the whole dataset.

    So what you do is cross validation, and break the data up into k-folds. You use the predictions on the k-th held out fold (cycled through all k of them) to get a (hopefully) unbiased estimate of what the model would be predicting on unseen data. Then you fit the meta model to those (no cross validation there).