I have a dataframe with two columns of characters that looks like this:
name | gene |
---|---|
GO:00001 | Gene_1 |
GO:00001 | Gene_2 |
GO:00002 | Gene_3 |
GO:00002 | Gene_4 |
GO:00002 | Gene_5 |
But I need to collapse the columns so that the "name" column isn't repetitive and the "gene" column contains each gene that matches to the same "name", separated by a comma and a space, like so:
name | gene |
---|---|
GO:00001 | Gene_1, Gene_2 |
GO:00002 | Gene_3, Gene_4, Gene_5 |
I have looked into the documentation for melt, collapse, and summarize, but I can't figure out how to do this with characters. Any help is much appreciated!!
Using dplyr:
> df %>%
group_by(name) %>%
summarise(gene = paste0(gene, collapse = ","))
# A tibble: 2 × 2
name gene
<chr> <chr>
1 GO:00001 Gene_1,Gene_2
2 GO:00002 Gene_3,Gene_4,Gene_5
Using R base
aggregate(gene ~ name, FUN= paste0, data=df)