Search code examples
rpurrr

How to pass the name of a variable to a custom function from `purrr::map()`?


I want to have a tibble that contains several nearly identical tibbles. In each nested tibble, one variable's contents are shuffled, the others are unchanged. I want to have N (e.g. 10) versions of each shuffled tibble.

library(tidyverse)

df <- tibble(a = 1:5, b = 6:10, c = 11:15)
permutation_n <- 10
tbl_names <- names(df)

shuffle_var <- function(data, var) {
  data %>% 
    mutate({{var}} := sample({{var}}))
}

> shuffle_var(df, a)
# A tibble: 5 × 3
      a     b     c
  <int> <int> <int>
1     4     6    11
2     2     7    12
3     1     8    13
4     3     9    14
5     5    10    15

shuffle_var() works correctly if I use it separately.

But if I use it inside map(), it will return a tibble with the original variable unchanged, and a new variable called .x

result <- 
  crossing(variable = tbl_names,
         index = 1:permutation_n) %>% 
  mutate(data = map(variable, ~shuffle_var(df, .x)),
  )

result %>% slice(1) %>% pull(data) %>% .[[1]]

# A tibble: 5 × 4
      a     b     c .x   
  <int> <int> <int> <chr>
1     1     6    11 a    
2     2     7    12 a    
3     3     8    13 a    
4     4     9    14 a    
5     5    10    15 a  

result should contain 30 rows (variables x versions). Each nested tibble should have 10 different versions, where the values of one variable are shuffled).

I have found several similar examples in SO but none of them helped in this specific issue.


Solution

  • Try it with .data pronoun inside the function:

    1. .data is the data that is iterated by map().
    2. Now we could use .data[[var]] inside the function to refer to the wanted column (var), and we can use map() to iterate of each nested tibble.
    shuffle_var <- function(data, var) {
      data %>% 
        mutate({{var}} := sample(.data[[var]]))
    }
    
    result <- 
      crossing(variable = tbl_names,
               index = 1:permutation_n) %>% 
      mutate(data = map(variable, ~shuffle_var(df, .x))
      )
    
    result %>% 
      slice(1) %>% pull(data) %>% .[[1]]
    
          a     b     c    .x
      <int> <int> <int> <int>
    1     1     6    11     1
    2     2     7    12     5
    3     3     8    13     3
    4     4     9    14     4
    5     5    10    15     2