Search code examples
rdplyrsummarize

Summarizing unknown number of column in R using dplyr


I have following data.frame (df)

ID1 ID2 Col1 Col2 Col3 Grp
A   B   1    3    6    G1
C   D   3    5    7    G1
E   F   4    5    7    G2
G   h   5    6    8    G2

What I would like to achieve is the following: - group by Grp, easy - and then summarize so that for each group I sum the columns and create the columns with strings with all ID1s and ID2s

It would be something like this:

df %>% 
   group_by(Grp) %>% 
      summarize(ID1s=toString(ID1), ID2s=toString(ID2), Col1=sum(Col1), Col2=sum(Col2), Col3=sum(Col3))

Everything is fine whae Iknow the number of the columns (Col1, Col2, Col3), however I would like to be able to implement it so that it would work for a data frame with known and always named the same ID1, ID2, Grp, and any number of additional numeric column with unknown names.

Is there a way to do it in dplyr.


Solution

  • I would like to be able to implement it so that it would work for a data frame with known and always named the same ID1, ID2, Grp, and any number of additional numeric column with unknown names.

    You can overwrite the ID columns first and then group by them as well:

    DF %>% 
      group_by(Grp) %>% mutate_each(funs(. %>% unique %>% sort %>% toString), ID1, ID2) %>% 
      group_by(ID1, ID2, add=TRUE) %>% summarise_each(funs(sum))
    
    # Source: local data frame [2 x 6]
    # Groups: Grp, ID1 [?]
    # 
    #     Grp   ID1   ID2  Col1  Col2  Col3
    #   (chr) (chr) (chr) (int) (int) (int)
    # 1    G1  A, C  B, D     4     8    13
    # 2    G2  E, G  F, h     9    11    15
    

    I think you'll want to uniqify and sort before collapsing to a string, so I've added those steps.