Search code examples
rdataframedplyrrowmean

calculate rowMeans by column groupings in R in dplyr without pivot_longer


I have a dataframe that looks like this:

> df[1:5,1:10]
         X    F1_01    F1_03    F1_04    F1_06    F1_09    F1_14    F1_15    F1_16    F1_17
1    gene0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2    gene1 3.420577 2.919879 2.287364 5.554634 2.233958 3.155860 2.946792 2.628113 2.702805
3   gene10 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
4  gene100 7.623784 7.035468 6.917434 6.276214 7.615697 5.822012 5.437085 4.691465 4.876582
5 gene1000 5.277115 6.184268 5.122632 5.827487 4.848992 3.419213 4.594827 4.123349 4.810539

And each column is grouped like this:

groups <- data.frame(ID = c("F1_01", "F1_03", "F1_04", "F1_06", "F1_09", "F1_14", "F1_15", "F1_16", "F1_17"),
                     group = c("A", "B", "C", "A", "B", "C", "A", "B", "C"))

And I would like rowMeans for each group (A, B, and C).

How would I go about specifying this in dplyr? I can use pivot_longer:

tmp %>% 
  pivot_longer(-10,
               names_to = "ID") %>% 
  left_join(groups) %>% 
  group_by(x,group) %>% 
  summarise(mean = mean(value)) %>% 
  spread(group, mean)

However, I DON'T WANT to use pivot_longer because the original dataframe has about 15k rows and 48 columns. My computer crashes when I try to do this. Is it possible to use rowMeans? I'm a bit stuck and any help would be appreciated

data
> dput(tmp)
structure(list(F1_01 = c(0, 3.420577, 0, 7.623784, 5.277115), 
    F1_03 = c(0, 2.919879, 0, 7.035468, 6.184268), F1_04 = c(0, 
    2.287364, 0, 6.917434, 5.122632), F1_06 = c(0, 5.554634, 
    0, 6.276214, 5.827487), F1_09 = c(0, 2.233958, 0, 7.615697, 
    4.848992), F1_14 = c(0, 3.15586, 0, 5.822012, 3.419213), 
    F1_15 = c(0, 2.946792, 0, 5.437085, 4.594827), F1_16 = c(0, 
    2.628113, 0, 4.691465, 4.123349), F1_17 = c(0, 2.702805, 
    0, 4.876582, 4.810539), x = c("id01", "id02", " id03", "id04", 
    "id05")), row.names = c(NA, 5L), class = "data.frame")

Solution

  • Here's a base R option :

    tmp1 <- tmp[-ncol(tmp)]
    cbind(tmp[ncol(tmp)], sapply(split.default(tmp1, groups$group), rowMeans))
    #      x        A        B        C
    #1  id01 0.000000 0.000000 0.000000
    #2  id02 3.974001 2.593983 2.715343
    #3  id03 0.000000 0.000000 0.000000
    #4  id04 6.445694 6.447543 5.872009
    #5  id05 5.233143 5.052203 4.450795
    

    If groups ID and column names of tmp are not arranged in the same order do the following before applying the answer above.

    tmp1 <- tmp1[groups$ID]