Search code examples
rtidyversersample

How do you extract values from rsample assessment/test splits?


How do you extract the max and min of the y values for each assessment/test split (the 5 assess values in each split; I think coded as $data$out_id) created using rsample::rolling_origin()? I'd also like to keep track of split id as well. I prefer a tidyverse solution if possible.

library(tidyverse)
library(rsample)

# example data
dat <- data.frame(x = 1:50, y = runif(50, 0.1, 30))

# create analysis and assessment partitions
dat_split <- rsample::rolling_origin(dat, initial = 10, assess = 5)

Solution

  • According to the rsample documentation, you can use assessment() to access the assessment partition to an individual split object.

    Thus, you can apply assessment() to each split and unnest, then summarize by the split id. Like the following:

    library(tidyverse)
    library(rsample)
    
    dat_split %>%
      mutate(assessments = map(splits, assessment)) %>%
      unnest(assessments) %>%
      summarise(min_y = min(y), max_y = max(y), .by = id)
    
    #> # A tibble: 36 × 3
    #>    id      min_y max_y
    #>    <chr>   <dbl> <dbl>
    #>  1 Slice01  2.82  17.6
    #>  2 Slice02  2.82  21.0
    #>  3 Slice03  2.82  21.0
    #>  4 Slice04  1.32  21.0
    #>  5 Slice05  1.32  21.0
    #>  6 Slice06  1.32  21.0
    #>  7 Slice07  1.32  20.8
    #>  8 Slice08  1.32  20.8
    #>  9 Slice09  1.85  20.8
    #> 10 Slice10  1.85  20.8
    #> # ℹ 26 more rows
    #> # ℹ Use `print(n = ...)` to see more rows