Search code examples
rdplyrtidyversepurrr

Mutate case_when nested conditional labelling


I have a data frame with multiple observations per ID. Each ID has performed many tests and for each test, has been classified into a tertile rank (top, mid, bottom) based on their test performance. This rank can vary at each time point.

Dummy df looks like this:

      df <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
        time = c(1,2,3,1,2,3,1,2,3,1,2,3),
        test1_rank <- c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
test2_rank <- c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"))
    
     ID  time `test1_rank <- ...` `test2_rank <- ...`
   <dbl> <dbl> <chr>               <chr>              
 1     1     1 top                 bottom             
 2     1     2 top                 bottom             
 3     1     3 top                 bottom             
 4     2     1 top                 top                
 5     2     2 mid                 mid                
 6     2     3 bottom              bottom             
 7     3     1 bottom              top                
 8     3     2 bottom              top                
 9     3     3 bottom              top                
10     4     1 top                 top                
11     4     2 bottom              bottom             
12     4     3 bottom              bottom  

I want to create a new variable where I classify each individual based on how their rank changes with time for each test. Specifically, for each test:

  • If rank is the same through all three time points, they get a label "stable" (can be stable(top) or stable(bottom) based on whether the rank is "top" or "bottom" across all time points).
  • If the rank changes from top (time 1) to mid (time 2) to bottom (time 3), they get a label of "gradual".
  • If rank declines from top (time 1) to bottom (time 2 and time 3), they get a label of "rapid".
  • Any other combination gets the label "Other".

Desired data frame:

  df2 <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
                  time = c(1,2,3,1,2,3,1,2,3,1,2,3),
                  test1_rank <- c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
test2_rank <- c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"),
                  test1_rankgroup <- c("stable(top)", "stable(top)", "stable(top)", "gradual", "gradual", "gradual", "stable(bottom)", "stable(bottom)", "stable(bottom)", "rapid", "rapid", "rapid"), 
test2_rankgroup <- c("stable(bottom)", "stable(bottom)", "stable(bottom)", "gradual", "gradual", "gradual", "stable(top)", "stable(top)", "stable(top)", "rapid", "rapid", "rapid"))

        ID  time `test1_rank <- ...` `test2_rank <- ...` `test1_rankgroup <- ...` test2_ra…¹
   <dbl> <dbl> <chr>               <chr>               <chr>                    <chr>     
 1     1     1 top                 bottom              stable(top)              stable(bo…
 2     1     2 top                 bottom              stable(top)              stable(bo…
 3     1     3 top                 bottom              stable(top)              stable(bo…
 4     2     1 top                 top                 gradual                  gradual   
 5     2     2 mid                 mid                 gradual                  gradual   
 6     2     3 bottom              bottom              gradual                  gradual   
 7     3     1 bottom              top                 stable(bottom)           stable(to…
 8     3     2 bottom              top                 stable(bottom)           stable(to…
 9     3     3 bottom              top                 stable(bottom)           stable(to…
10     4     1 top                 top                 rapid                    rapid     
11     4     2 bottom              bottom              rapid                    rapid     
12     4     3 bottom              bottom              rapid                    rapid 

What is the easiest way to solve this in dplyr using mutate and case_when?


Solution

  • You can use mutate with across to apply your case_when to both of your ranked columns.

    Your case_when can use n_distinct to see if the same value is held across your 3 time points. You can also include specific logic for 1st, 2nd, and 3rd values for a labelled group value.

    The .default argument can be used for "other". The .by argument will perform the mutate grouped by ID.

    library(tidyverse)
    
    df <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
                 time = c(1,2,3,1,2,3,1,2,3,1,2,3),
                 test1_rank = c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
                 test2_rank = c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"))
    
    df %>%
      mutate(across(-time,
                   ~ case_when(n_distinct(.) == 1 ~ paste("stable", .),
                               .[1] == "top" & .[2] == "mid" & .[3] == "bottom" ~ "gradual",
                               .[1] == "top" & .[2] == "bottom" & .[3] == "bottom" ~ "rapid",
                               .default = "other"), 
                   .names = "{.col}_group"),
             .by = ID)
    

    Output

          ID  time test1_rank test2_rank test1_rank_group test2_rank_group
       <dbl> <dbl> <chr>      <chr>      <chr>            <chr>           
     1     1     1 top        bottom     stable top       stable bottom   
     2     1     2 top        bottom     stable top       stable bottom   
     3     1     3 top        bottom     stable top       stable bottom   
     4     2     1 top        top        gradual          gradual         
     5     2     2 mid        mid        gradual          gradual         
     6     2     3 bottom     bottom     gradual          gradual         
     7     3     1 bottom     top        stable bottom    stable top      
     8     3     2 bottom     top        stable bottom    stable top      
     9     3     3 bottom     top        stable bottom    stable top      
    10     4     1 top        top        rapid            rapid           
    11     4     2 bottom     bottom     rapid            rapid           
    12     4     3 bottom     bottom     rapid            rapid