Search code examples
rdplyrpurrr

R dplyr function ceases to work when using map to iterate over vector of column names


I've got a large dataset and want to apply a custom function over each of the columns.

I've written the function and it works when applied as a one-off on one column of the target dataset.

However, when I try to use purrr::map, then the custom function throws an error about halfway through the function. There's a select statement which references the column I want to use and this throws an error saying can't reference columns that don't exist; Column `var1` doesn't exist.

I've put a reproducible example below. The actual data is 1,000s of columns wide.

Ultimately, I want to get an output where the relative risk of the outcome variable is listed against each column in the dataset:

column_name rr_0 rr_1
var1 1 1.17
var2 1 1.03
var3 ... ...

Reproducible example

library(dplyr)
library(purrr)
library(tidyr)
set.seed(1)
# sample dataset
test_dat <- data.frame(var1 = rbinom(n = 10, size = 1, prob = 0.3), 
                       var2 = rbinom(n = 10, size = 1, prob = 0.1),
                       var3 = rbinom(n = 10, size = 1, prob = 0.4),
                       outcome = rbinom(n = 10, size = 1, prob = 0.3))
test_dat

# get names of columns to iterate
over_vec <- names(test_dat)
over_vec <- over_vec[!(over_vec %in% c("outcome"))]
over_vec

# function I want to use
test_fun <- function(code, dataset){
  dataset <- dataset %>% 
    group_by({{code}}) %>% 
    summarise(n = n(), n_out = sum(outcome)) %>% 
    mutate(risk = n_out/n * 100, 
           rr = risk / risk[row_number() == 1])  %>%
    dplyr::select({{code}}, rr) %>%
    pivot_wider(names_from = {{code}}, values_from = rr)
  return(dataset)
}

# works for one column
test_fun(code = var1, dataset = test_dat)

# fails when iterated with purrr::map
output <- over_vec %>% 
  map(.x = ., .f = test_fun, dataset = test_dat)
output

Solution

  • You are passing the column names as a vector of strings, but your function is written to accept a bare symbol as the first argument - note that you are calling it successfully with code = var1, not as code = "var1".

    You could just convert over_vec to a list of symbols:

    over_vec %>% 
      lapply(as.symbol) %>%
      map(.x = ., .f = test_fun, dataset = test_dat)
    #> [[1]]
    #> # A tibble: 1 x 2
    #>     `0`   `1`
    #>   <dbl> <dbl>
    #> 1     1  1.17
    #> 
    #> [[2]]
    #> # A tibble: 1 x 2
    #>     `0`   `1`
    #>   <dbl> <dbl>
    #> 1     1     0
    #> 
    #> [[3]]
    #> # A tibble: 1 x 2
    #>     `0`   `1`
    #>   <dbl> <dbl>
    #> 1     1  1.17