Search code examples
rgroup-bystring-concatenationdplyr

R determine if more than 1 value exists in a column, mutate concatenate if true


Couldn't quite find a solution to this one, looking for some guidance.

Here is a sample of a data set I have :

     ID   Rank    Date       Date2.      Group   Group2
1   5678   1    2000-01-01   2010-05-02    A      A
2   5678   2    2010-05-02   2010-05-02    A      A
3   1234   1    2000-01-01   2015-06-03    B      A&B
4   1234   2    2015-06-03   2015-06-03    A      A&B

I would like to group by ID and determine if multiple values exist for that ID in the Group column. And if so, concatenate them together.

Group2 is the desired output based on grouping ID and seeing if multiple values exist in Group. I am using dplyr but am unsure where to go from here:

df <- df %>% group_by(ID) %>% mutate(Group2 = if else(?))

Solution

  • I think that the easiest way is the unique function. This function creates a vector with the distinct values of the vector given.

    > df %>%
       group_by(ID)%>%
       mutate(Group2=paste(sort(unique(Group)),
                           collapse="&"))
    
    # A tibble: 4 x 3
    # Groups:   ID [2]
         ID Group Group2
      <dbl> <chr> <chr> 
    1  5678 A     A     
    2  5678 A     A     
    3  1234 B     A&B   
    4  1234 A     A&B