How do the aggregation methods aggregate performance metrics in MLR?

In MLR R package, there are methods to aggregate the parameter tuning model performance metrics, like train.mean, train.sd, test.mean, test.sd. I'm wondering how the aggregation process was done. From my current knowledge, it seemed the aggregation was carried out at the fold level.

Let's say for 10 repeats, 5 fold cross validation, there are totally 10*5 test error estimate, so the standard deviation of the test error is the variation of the 50 estimates. But what I want is the error estimate at the repeats level, so for each repeat I have a error estimate averaged by the inner 5 fold, and then my desired standard deviation of test error should be the variation of 10 estimate from the 10 repeats.

I'm wondering how this could be done. Is there any way to extract the raw performance metric for each tuning parameter in each resample?

Solution

I found that there are two aggregation functions that seem to work on the repeat level "testgroup.sd" and "testgroup.mean". From their description in mlr.mlr-org.com/reference/aggregations.html