Search code examples
rdplyrpurrrrlangnon-standard-evaluation

Loop over character vectors and use elements as column names within lambda function


I would like to loop over a vector of variable names with purrr, then use the variables inside a function with dplyr, as with the following code:

library(dplyr)
library(purrr)

#creating index
index<-c('Sepal.Length', 'Sepal.Width')

#mapping over index with lambda function
map(index, ~iris %>% filter (.x > mean(.x)))

I was expecting to see a list of two data.frames, as in

list(Sepal.Length = iris %>% filter (Sepal.Length > mean(Sepal.Length)),
     Sepal.Width = iris %>% filter (Sepal.Width > mean(Sepal.Width)))

Is there a way to use the .x variables as column names within the data.frames in the lambda function?

I think it may have something to do with data masking and non-standard evaluation, and I suspect rlang may be helpful here, but I am not familiar with the subject. Thank you


Solution

  • Those are strings. We need to convert to symbol and evaluate (!!)

    library(purrr)
    library(dplyr)
    out <- map(index, ~iris %>%
           filter (!! rlang::sym(.x) > mean(!! rlang::sym(.x))))
    names(out) <- index
    

    -output

    > str(out)
    List of 2
     $ Sepal.Length:'data.frame':   70 obs. of  5 variables:
      ..$ Sepal.Length: num [1:70] 7 6.4 6.9 6.5 6.3 6.6 5.9 6 6.1 6.7 ...
      ..$ Sepal.Width : num [1:70] 3.2 3.2 3.1 2.8 3.3 2.9 3 2.2 2.9 3.1 ...
      ..$ Petal.Length: num [1:70] 4.7 4.5 4.9 4.6 4.7 4.6 4.2 4 4.7 4.4 ...
      ..$ Petal.Width : num [1:70] 1.4 1.5 1.5 1.5 1.6 1.3 1.5 1 1.4 1.4 ...
      ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
     $ Sepal.Width :'data.frame':   67 obs. of  5 variables:
      ..$ Sepal.Length: num [1:67] 5.1 4.7 4.6 5 5.4 4.6 5 4.9 5.4 4.8 ...
      ..$ Sepal.Width : num [1:67] 3.5 3.2 3.1 3.6 3.9 3.4 3.4 3.1 3.7 3.4 ...
      ..$ Petal.Length: num [1:67] 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.5 1.5 1.6 ...
      ..$ Petal.Width : num [1:67] 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.1 0.2 0.2 ...
      ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
    

    -testing with OP's expected

    > expected <- list(Sepal.Length = iris %>% filter (Sepal.Length > mean(Sepal.Length)),
    +      Sepal.Width = iris %>% filter (Sepal.Width > mean(Sepal.Width)))
    > 
    > identical(out, expected)
    [1] TRUE
    

    Or subset with cur_data()

    map(index, ~ iris %>%
         filter(cur_data()[[.x]] > mean(cur_data()[[.x]])))
    

    Or use across or if_all, which takes directly string

    map(index, ~ iris %>%
               filter(across(all_of(.x), ~ . > mean(.))))