Search code examples
rdplyrmultiple-conditions

Creating new variable with dplyr::mutate based on multiple conditions and corresponding variable names passed by string vector (or tidyselect)


I'm pretty sure this was discussed before but I'm struggling verbalizing the problem: For example, I'm looking for this data frame...

iris %>%
    mutate(has_petal_1.4 = Petal.Length == 1.4 | Petal.Width == 1.4,
           width_greater_1 = Sepal.Width > 1 & Petal.Width > 1)

...without having to name the variables in the conditions explicitly. Is there a way to pass the variable names using a string vector? Unfortunately, this doesn't seem to work:

varsel <- c('Petal.Length', 'Petal.Width')
iris %>%
  mutate(has_petal_1.4 = 1.4 %in% c(!!! syms(varsel)))

Moreover, I wonder whether there is a solution using tidyselect within the mutate() function. So far, I used the new and handy across() function in order to mutate multiple variables. Is it possible to use it for conditions as well? Here another example that doesn't work:

iris %>%
  mutate(has_petal_1.4 = across(c(starts_with('Petal')), function(x) {1.4 %in% x}))

Any help is highly appreciated.


Solution

  • There are multiple ways, one option is c_across

    library(dplyr) # >= 1.0.0
    iris %>% 
        rowwise %>% 
        mutate(has_petal_1.4 = any(c_across(varsel) == 1.4),
               width_greater_1 = all(c_across(ends_with('Width')) > 1)) %>%
        ungroup
    # A tibble: 150 x 7
    #   Sepal.Length Sepal.Width Petal.Length Petal.Width Species has_petal_1.4 width_greater_1
    #          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <lgl>         <lgl>          
    # 1          5.1         3.5          1.4         0.2 setosa  TRUE          FALSE          
    # 2          4.9         3            1.4         0.2 setosa  TRUE          FALSE          
    # 3          4.7         3.2          1.3         0.2 setosa  FALSE         FALSE          
    # 4          4.6         3.1          1.5         0.2 setosa  FALSE         FALSE          
    # 5          5           3.6          1.4         0.2 setosa  TRUE          FALSE          
    # 6          5.4         3.9          1.7         0.4 setosa  FALSE         FALSE          
    # 7          4.6         3.4          1.4         0.3 setosa  TRUE          FALSE          
    # 8          5           3.4          1.5         0.2 setosa  FALSE         FALSE          
    # 9          4.4         2.9          1.4         0.2 setosa  TRUE          FALSE          
    #10          4.9         3.1          1.5         0.1 setosa  FALSE         FALSE          
    # … with 140 more rows
    

    Or a faster option with rowSums

    iris %>%     
        mutate(has_petal_1.4 =  rowSums(select(., varsel) == 1.4) > 0,
               width_greater_1 = rowSums(select(., ends_with('Width')) > 1) == 2)