Search code examples
rcharacter

Is there an R function for collapsing characters into one cell if they have a matching character in another cell?


I have a dataframe with two columns of characters that looks like this:

name gene
GO:00001 Gene_1
GO:00001 Gene_2
GO:00002 Gene_3
GO:00002 Gene_4
GO:00002 Gene_5

But I need to collapse the columns so that the "name" column isn't repetitive and the "gene" column contains each gene that matches to the same "name", separated by a comma and a space, like so:

name gene
GO:00001 Gene_1, Gene_2
GO:00002 Gene_3, Gene_4, Gene_5

I have looked into the documentation for melt, collapse, and summarize, but I can't figure out how to do this with characters. Any help is much appreciated!!


Solution

  • Using dplyr:

    > df %>% 
        group_by(name) %>% 
        summarise(gene = paste0(gene, collapse = ","))
    # A tibble: 2 × 2
      name     gene                
      <chr>    <chr>               
    1 GO:00001 Gene_1,Gene_2       
    2 GO:00002 Gene_3,Gene_4,Gene_5
      
    

    Using R base

    aggregate(gene ~ name, FUN= paste0, data=df)