Search code examples
raggregatetaxonomy

aggregate columns by character


Hi i would like to aggregate several columns.

d <- structure(list(Gene = structure(1:3, .Label = c("k141_20041_1", 
"k141_27047_2", "k141_70_3"), class = "factor"), phylum = structure(c(1L, 
1L, 1L), .Label = "Firmicutes", class = "factor"), class = structure(c(1L, 
1L, 1L), .Label = "Bacillales", class = "factor"), order = structure(c(1L, 
1L, 1L), .Label = "Bacilli", class = "factor"), family = structure(c(1L, 
1L, 1L), .Label = "Bacillaceae", class = "factor"), genus = structure(c(1L, 
1L, 1L), .Label = "Bacillus", class = "factor"), species = structure(c(1L, 
1L, 2L), .Label = c("Bacillus subtilis", "unknown"), class = "factor"), 
    SampleA = c(0, 0, 0), SampleB = c(0, 0, 0), SampleCtrl = c(3.98888888888889, 
    11.5555555555556, 3.35978835978836)), .Names = c("Gene", 
"phylum", "class", "order", "family", "genus", "species", "SampleA", 
"SampleB", "SampleCtrl"), row.names = c(21918L, 40410L, 40857L
), class = "data.frame")

This in the input dataframe to aggregate

   Gene     phylum      class   order      family    genus           species SampleA SampleB
k141_20041_1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_27047_2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_70_3 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0
  SampleCtrl
  3.99
 11.56
  3.36

What i want at the end is one single line with all columns. In this case it would look like this (we can remove the gene column).

    phylum   class order  family  genus  species SampleA SampleB SampleCtrl
    Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0     15.6
    Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus Unknown       0       0     3.36

Note that is a very simple example. I have 20 samples and more than 500 species in the original dataframe.


Solution

  • Here's a dplyr solution:

    library(dplyr)
    d%>%
    group_by(phylum,class,order,family,genus, species)%>%
    summarise_if(is.numeric, sum)    
    Groups: phylum, class, order, family, genus [?]
    
          phylum      class   order      family    genus           species SampleA SampleB SampleCtrl
          <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>   <dbl>   <dbl>      <dbl>
    1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0   15.54444
    2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0    3.35979