Search code examples
rdata-manipulationdata-transform

r create a new Indicator column based on combination of variables


I have a dataset with two colums, ID, Group.

ID       Group
10       Red
11       Red
13       Blue
13       Red
15       Blue
15       Blue
17       Blue
17       Red
17       Red
19       Blue
19       Undecided
23       Blue
23       Undecided
23       Undecided       

My goal is to create a third indicator column that takes a value based on the combination of values in ID and Group. Expecting a third column , Indicator like this.

ID       Group          Indicator
10       Blue           Single
11       Blue           Single

13       Blue           Blue-Red
13       Red            Blue-Red

15       Blue           Blue-Blue
15       Blue           Blue-Blue

17       Blue           Blue-Red
17       Red            Blue-Red
17       Red            Blue-Red  

19       Blue           Blue-Undecided  
19       Undecided      Blue-Undecided 

23       Blue           Blue-Undecided 
23       Undecided      Blue-Undecided   
23       Undecided      Blue-Undecided 

I tried case_when but it became too complicated and gave up. Any help in much appreciated. Thanks.


Solution

  • Here's a case_when version using dplyr that doesn't have too many cases

    library(dplyr)
    dd  %>% 
      mutate(Indicator = case_when(
          n()==1 ~ "Single",
          n_distinct(Group)==1 ~ paste(rep(first(Group),2), collapse="-"),
          TRUE ~ paste(sort(unique(Group)), collapse="-")
        ), .by=ID)
    

    which produces

       ID     Group      Indicator
    1  10       Red         Single
    2  11       Red         Single
    3  13      Blue       Blue-Red
    4  13       Red       Blue-Red
    5  15      Blue      Blue-Blue
    6  15      Blue      Blue-Blue
    7  17      Blue       Blue-Red
    8  17       Red       Blue-Red
    9  17       Red       Blue-Red
    10 19      Blue Blue-Undecided
    11 19 Undecided Blue-Undecided
    12 23      Blue Blue-Undecided
    13 23 Undecided Blue-Undecided
    14 23 Undecided Blue-Undecided
    

    tested with

    dd <- data.frame(
      ID = c(10L, 11L, 13L, 13L, 15L, 15L, 17L, 17L, 17L, 19L, 19L, 23L, 23L, 23L), 
      Group = c("Red", "Red", "Blue", "Red", "Blue", "Blue", "Blue", "Red", "Red", "Blue", "Undecided", "Blue", "Undecided", "Undecided")
    )