Search code examples
rlmbootstrappingtidymodelsrsample

Bootstrap resampling and tidy regression models with grouped/nested data


I am trying to estimate regression slopes and their confidence intervals using bootstrapping. I would like to do it for grouped data. I was following the example at this website (https://www.tidymodels.org/learn/statistics/bootstrap/), but I couldn't figure out how to get it to work with grouped/nested data. I keep getting the following:

Error: Problem with mutate() column model. ℹ model = map(splits, ~lm(conc ~ yday, data = .)). x object 'conc' not found

library(tidyverse)
library(tidymodels)

dat <- 
  structure(list(site = c("mb", "mb", "mb", "mb", "mb", "mb", "mb", 
  "mb", "sp", "sp", "sp", "sp", "sp", "sp", "sp", "sp"), year = c(2015, 
  2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
  2015, 2015, 2015, 2015), yday = c(15, 15, 35, 35, 48, 48, 69, 
  69, 15, 15, 37, 37, 49, 49, 69, 69), samp_depth_cat2 = structure(c(1L, 
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Mid-2", 
  "Bottom"), class = "factor"), analyte = c("NO3", "NO3", "NO3", 
  "NO3", "NO3", "NO3", "NO3", "NO3", "NH4", "NH4", "NH4", "NH4", 
  "NH4", "NH4", "NH4", "NH4"), conc = c(44.8171069465267, 44.7775358035268, 
  33.3678662097523, 33.0710828871279, 25.8427604055115, 26.9309658742058, 
  23.7585524380667, 17.5240386949382, 8.35832733633183, 9.29280745341615, 
  10.0797380595417, 10.2322058970515, 13.7930668951239, 15.6226805882773, 
  25.3003042764332, 16.8723637466981)), row.names = c(NA, -16L), class = c("tbl_df", 
  "tbl", "data.frame"))

set.seed(27)

# This is where I get the error
lm_boot <-
  dat %>% 
  group_by(site, year, samp_depth_cat2, analyte) %>% 
  nest() %>% 
  bootstraps(., times = 1000, apparent = TRUE) %>% 
  mutate(model = map(splits, ~lm(conc ~ yday, data = .)),
         coef_info = map(model, tidy))

boot_coefs <- 
  lm_boot %>% 
  unnest(coef_info)

percentile_intervals <- int_pctl(lm_boot, coef_info)
percentile_intervals

Update

I tried mapping the bootstrap function and then do the linear regression on the splits within that list column. It produced a new column called model but there don't appear to be any model elements in there.

lm_boot <-
  dat %>% 
  group_by(site, year, samp_depth_cat2, analyte) %>% 
  nest() %>% 
  mutate(boots = map(data, ~bootstraps(., times = 1000, apparent = TRUE)),
         model = map(boots, "splits", ~lm(conc ~ yday, data = .x)))

Any thoughts?


Solution

  • You can wrap the bootstrapping procedure in a group_modify to apply it to each group.

    library(tidyverse)
    library(tidymodels)
    
    dat <-
      structure(list(site = c("mb", "mb", "mb", "mb", "mb", "mb", "mb",
      "mb", "sp", "sp", "sp", "sp", "sp", "sp", "sp", "sp"), year = c(2015,
      2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
      2015, 2015, 2015, 2015), yday = c(15, 15, 35, 35, 48, 48, 69,
      69, 15, 15, 37, 37, 49, 49, 69, 69), samp_depth_cat2 = structure(c(1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Mid-2",
      "Bottom"), class = "factor"), analyte = c("NO3", "NO3", "NO3",
      "NO3", "NO3", "NO3", "NO3", "NO3", "NH4", "NH4", "NH4", "NH4",
      "NH4", "NH4", "NH4", "NH4"), conc = c(44.8171069465267, 44.7775358035268,
      33.3678662097523, 33.0710828871279, 25.8427604055115, 26.9309658742058,
      23.7585524380667, 17.5240386949382, 8.35832733633183, 9.29280745341615,
      10.0797380595417, 10.2322058970515, 13.7930668951239, 15.6226805882773,
      25.3003042764332, 16.8723637466981)), row.names = c(NA, -16L), class = c("tbl_df",
      "tbl", "data.frame"))
    
    set.seed(27)
    
    dat %>%
      group_by(site, year, samp_depth_cat2, analyte) %>%
      group_modify(
        ~ bootstraps(., times = 100, apparent = TRUE) %>%
          mutate(
            model = map(splits, ~ lm(conc ~ yday, data = .)),
            coefs = map(model, tidy)
          ) %>%
          int_pctl(coefs)
      )
    #> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
    #> `(Intercept)`, `yday`.
    
    #> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
    #> `(Intercept)`, `yday`.
    #> # A tibble: 4 × 10
    #> # Groups:   site, year, samp_depth_cat2, analyte [2]
    #>   site   year samp_depth_cat2 analyte term        .lower .estimate .upper .alpha
    #>   <chr> <dbl> <fct>           <chr>   <chr>        <dbl>     <dbl>  <dbl>  <dbl>
    #> 1 mb     2015 Mid-2           NO3     (Intercept) 39.1      49.5   53.2     0.05
    #> 2 mb     2015 Mid-2           NO3     yday        -0.563    -0.443 -0.241   0.05
    #> 3 sp     2015 Bottom          NH4     (Intercept) -5.60      3.40   6.68    0.05
    #> 4 sp     2015 Bottom          NH4     yday         0.138     0.236  0.420   0.05
    #> # … with 1 more variable: .method <chr>