Search code examples
rtidyversepurrr

How can I pass values from columns of a parent dataframe to operate on a nested dataframe two levels below?


My tibble shocks_small is structured with several regular columns of parameter values including mu1, and several list columns prefixed with v1_v2_T. Each observation of these list columns is another tibble, with a regular column indicating the simulation number and another list column v1_v2 containing a tibble of random variables (v1 and v2).

Is is possible to pass the parameter values from the parent tibble down two levels of depth to a function operating on the random variables tibble? The goal would be to iterate through each row of the parent table, then again through each simulation.

This code works, because it does not require any values from shocks_small:

shocks_small %>%
  modify_at(vars(starts_with("v1_v2_T")), \(z)
            z %>% map(\(x)
                      x %>%
                        mutate(v1_v2 = map(v1_v2, \(y) 
                                           y %>% mutate(e1 = lag(v1, 1))))))

This code, however, throws an error, presumably because the most deeply nested map-mutate cannot seem to find the mu1 value.

shocks_small %>%
  modify_at(vars(starts_with("v1_v2_T")), \(z)
            z %>% map(\(x)
                      x %>%
                        mutate(v1_v2 = map(v1_v2, \(y) 
                                           y %>% mutate(e1 = mu1 + lag(v1, 1))))))

Error in `map()`:
ℹ In index: 1.
ℹ With name: v1_v2_T12.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `mutate()`:
ℹ In argument: `v1_v2 = map(v1_v2, function(y) y %>% mutate(e1 = mu1 + lag(v1, 1)))`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `mutate()`:
ℹ In argument: `e1 = mu1 + lag(v1, 1)`.
Caused by error:
! object 'mu1' not found
Backtrace:
  1. shocks_small %>% ...
 35. dplyr:::mutate.data.frame(., e1 = mu1 + lag(v1, 1))
 36. dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
 38. dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
 39. mask$eval_all_mutate(quo)
 40. dplyr (local) eval()

Is there a way to access back up to shocks_small from the innermost map() or to pass the mu1 value along each iteration? Appreciate any suggestions.

Update: Here is code to reproduce a simplified dataset to be compatible with the code I've provided:

intermediate_tbl1 <- map_dfr(1:5, ~ map_dfc(1:2, ~runif(5)) %>% 
      set_names(c("v1", "v2")) %>% 
      mutate(repl = .x) %>% 
      relocate(repl, .before = everything())) %>% 
      nest(v1_v2 = c(v1, v2))

intermediate_tbl2 <- map_dfr(1:5, ~ map_dfc(1:2, ~runif(10)) %>% 
          set_names(c("v1", "v2")) %>% 
          mutate(repl = .x) %>% 
          relocate(repl, .before = everything())) %>% 
  nest(v1_v2 = c(v1, v2))

parameter <- tibble(mu1 = c(1, 2))

shocks_small <- parameter %>% mutate(
  v1_v2_T = list(intermediate_tbl1, intermediate_tbl2)
)

Solution

  • I'll demo using mutate(across(..), ..), I think the use of modify_at is "hiding" some of the surrounding context. It is roughly the same procedure. However, the key here is that we need exactly one mu1 per nested-map, and even when we can use mu1, it'll be length-2 instead of the group-wise that we need. For this, I'll pass it down via pmap instead of map.

    out <- shocks_small %>%
      mutate(
        across(starts_with("v1_v2_T"),
               \(z) pmap(list(z, mu1), \(x, mu)
                         mutate(x, v1_v2 =
                                     map(v1_v2, \(y) mutate(y, e1 = mu + lag(v1, 1))))) )
      )
    out$v1_v2_T[[1]]$v1_v2[[1]]
    # # A tibble: 5 × 3
    #       v1     v2    e1
    #    <dbl>  <dbl> <dbl>
    # 1 0.664  0.163  NA   
    # 2 0.216  0.637   1.66
    # 3 0.0754 0.0446  1.22
    # 4 0.572  0.838   1.08
    # 5 0.208  0.564   1.57
    

    Once we get past the first map-level with pmap, it appears our mu (length-1) is visible to the lower levels.