Search code examples
rdplyrgroup-bysummarizerowsum

Aggregate observations across samples by rownames (dplyr) in R


Aloha,

I am trying to get the total counts for each row name in my sample matrix. For some reason, I have tried both rowsum and then converting to a data frame and using dplyr::group_by but they are giving errors. Here is a subset of example data:

mat = matrix(c(0,1,2,3,4), nrow=3, ncol = 5)
rownames(mat) <- c("CHO", "NO", "O")
colnames(mat) <-  c("sample_1", "sample_2", "sample_3", "sample_4", "sample_5")`

I would like to have a resulting data frame with the formula name, then the sum of observations across samples and the percent of samples formula was observed in overall.

It seems easy enough but I have tried all different combinations aggregating the data with no avail and would be very appreciative of some guidance.


Solution

  • We may need only rowSums

    rowSums(mat)
    

    If there are duplicate rownames (in the example data, the rownames are unique), then we use rowsum with group specified as the rownames

    rowsum(mat, row.names(mat))
    

    and then we use rowSums on top of that

    rowSums(rowsum(mat, row.names(mat)))