Search code examples
rmlr3

Viewing data properties after proceeding through 'classbalancing' pipeline mlr3


I am trying to find a way of ensure my mlr3pipeline is working as expected. I have a classbalancing pipeline and am trying to view properties of the data being given to my model, and split for testing/training. I suspect a much larger portion than what I want is being given for training/testing during each iteration.

graph_lr <- po('classbalancing', 
               adjust = 'downsample', 
               reference = 'minor', 
               ratio = 5) %>% 
  po("encode", method = 'treatment') %>% 
  po("scale") %>% 
  lrn("classif.cv_glmnet",
      predict_type = 'prob',
      type.measure = 'auc',
      predict_sets = c("train", "test"))

graphLearner_lr <- GraphLearner$new(graph_lr)

I am intended to downsample my major class (binary problem) to a ratio of 5 X the minor class. It is then resampled.

lr_resample <- mlr3::resample(task = task_lr, 
                              graphLearner_lr, 
                              outerResamp, 
                              store_models = TRUE,
                              store_backends = TRUE)

How can I view properties of the downsampled data (such as nrows, row indexes etc)? I have tried looking in the individual learners and elsewhere in the ResampleResult, but have been unable to find anything


Solution

  • You can use the $keep_results flag of Graph to store the intermediate tasks. The $data() method returns the data.

    library(mlr3verse)
    library(mlr3learners)
    
    task = tsk("spam")
    
    graph = po("classbalancing", adjust = "downsample", reference = "minor", ratio = 5) %>>% 
      po("encode", method = "treatment") %>>% 
      po("scale") %>>% 
      lrn("classif.cv_glmnet", predict_type = "prob", type.measure = "auc", predict_sets = c("train", "test"))
    graph$keep_results = TRUE
    
    graph_learner = as_learner(graph)
    
    rr = resample(task,  graph_learner, rsmp("cv", folds = 3), store_models = TRUE, store_backends = TRUE)
    
    trained_learner_1 = rr$learners[[1]]
    
    # Task of iteration 1 after class balancing
    trained_learner_1$graph$pipeops$classbalancing$.result$output
    
    # Task of iteration 1  after class balancing and encoding
    trained_learner_1$graph$pipeops$encode$.result$output
    
    # Task of iteration 1  after class balancing, encoding and scaling
    trained_learner_1$graph$pipeops$scale$.result$output