Search code examples
rrulesrowwise

Efficient way to implement rule-based value assignment


I'm trying to come up with an elegant, rule-based way to assign codes to rows in a data frame based on combinations of values in columns, using this data:

library(tidyr)
df <- crossing(yr2018=c("M","S","W"),
               yr2019=c("M","S","W"),
                yr2020=c("M","S","W")) %>%
  print(n=27)

# A tibble: 27 × 3
   yr2018 yr2019 yr2020
   <chr>  <chr>  <chr> 
 1 M      M      M     
 2 M      M      S     
 3 M      M      W     
 4 M      S      M     
 5 M      S      S     
 6 M      S      W     
 7 M      W      M     
 8 M      W      S     
 9 M      W      W     
10 S      M      M     
11 S      M      S     
12 S      M      W     
13 S      S      M     
14 S      S      S     
15 S      S      W     
16 S      W      M     
17 S      W      S     
18 S      W      W     
19 W      M      M     
20 W      M      S     
21 W      M      W     
22 W      S      M     
23 W      S      S     
24 W      S      W     
25 W      W      M     
26 W      W      S     
27 W      W      W     
>

What I want to end up with is a column with codes applied with rules such the following:

  • if all 3 values in yr2018, yr2019, and yr2020 are the same (MMM, SSS, or WWW), then set the new column value to the concatenation of "CON" and whatever the unique value is, so either "CONM", "CONS", or "CONW".
  • if the first and third columns are the same, but the second is different, then concatenate the two unique values together as exactly "MS","MW", or "SW", in that order, depending on which two unique values are in the row, regardless of the order of the values in the columns.
  • if all three are different, regardless of order, then "MSW"
  • if the last two are the same, but different from the first, then concatenate "CON" with the last value, so either "CONM", "CONS", or "CONW"
  • lastly, if the first two are the same and the last different, then concatenate "CON" with the first column, so either "CONM", "CONS", or "CONW"

This feels like a big, ugly if statement, but I'm hoping for something more elegant, especially since my real data is actually 4x5 (625 rows). It also feels like maybe regular expressions, which I struggle with.

I started looking into row-wise functions and found rowwise() as a start to logically reconfigure the data frame, but it looks like the number of functions that can operate that way are limited.

All guidance welcome!


Solution

  • You can use mutate and case_when to efficiently satisfy these conditions. sort in the second logic will organize the letters as you described.

    Since case_when evaluates iteratively, you may be able to parse this down to make it more elegant, but as written it should follow your exact conditions:

    library(dplyr)
    
    df %>%
      rowwise() %>%
      mutate(new_column = case_when(
        yr2018 == yr2019 & yr2019 == yr2020 ~ paste0("CON", yr2018),
        yr2018 == yr2020 ~ paste(sort(c(yr2019, yr2020)), collapse = ""),
        yr2018 != yr2019 & yr2019 != yr2020 & yr2018 != yr2020 ~ "MSW",
        yr2019 == yr2020 & yr2018 != yr2020 ~ paste0("CON", yr2020),
        yr2018 == yr2019 & yr2018 != yr2020 ~ paste0("CON", yr2018)
      )) 
    

    Output:

       yr2018 yr2019 yr2020 new_column
       <chr>  <chr>  <chr>  <chr>     
     1 M      M      M      CONM      
     2 M      M      S      CONM      
     3 M      M      W      CONM      
     4 M      S      M      MS        
     5 M      S      S      CONS      
     6 M      S      W      MSW       
     7 M      W      M      MW        
     8 M      W      S      MSW       
     9 M      W      W      CONW      
    10 S      M      M      CONM      
    11 S      M      S      MS        
    12 S      M      W      MSW       
    13 S      S      M      CONS      
    14 S      S      S      CONS      
    15 S      S      W      CONS      
    16 S      W      M      MSW       
    17 S      W      S      SW        
    18 S      W      W      CONW      
    19 W      M      M      CONM      
    20 W      M      S      MSW       
    21 W      M      W      MW        
    22 W      S      M      MSW       
    23 W      S      S      CONS      
    24 W      S      W      SW        
    25 W      W      M      CONW      
    26 W      W      S      CONW      
    27 W      W      W      CONW