Search code examples
rsumrowapply

Sum subsequent entries (replicates) in data.frame


I have a pretty easy question that I cannot find a simple solution to myself. I have a data.frame of expression data. Each row corresponds to one measured gene. And the columns are measured expressions at different timepoints, where each timepoint has 4 replicates. It looks a bit like this:

         0h_1    0h_2    0h_3    0h_4    1h_1   1h_2    1h_3   1h_4    2h_1    2h_2    2h_3     2h_4    3h_1     3h_2     3h_3    3h_4 
gene1    434     123     42      94      9811   262     117    42      327     367     276      224
gene2    47      103     30      847     13     291     167    358     303     293     2263     741
gene3    322     27      97      217     223    243     328    308     328     299     518      434

I want to sum up all the replicates for each row, so that the result will have a row for each gene and just ONE column for each timepoint instead of FOUR. Is there any function that lets me do that efficiently?

For clarification: what I am looking for is a data.frame like this:

         0h     1h     2h     3h     ...
gene1   693     9811  
gene2   1027    13
gene3 

Thanks in advance. Best, Jonas


Solution

  • Here's an option in base R:

    res <- as.data.frame(lapply(split.default(df1, sub("_.*$","",names(df1))), rowSums))
    names(res) <- gsub("^X","",names(res))
    res
    #         0h    1h   2h
    # gene1  693 10232 1194
    # gene2 1027   829 3600
    # gene3  663  1102 1579
    

    data

    df1 <- read.table(text="
    0h_1    0h_2    0h_3    0h_4    1h_1   1h_2    1h_3   1h_4    2h_1    2h_2    2h_3     2h_4 
    gene1    434     123     42      94      9811   262     117    42      327     367     276      224
    gene2    47      103     30      847     13     291     167    358     303     293     2263     741
    gene3    322     27      97      217     223    243     328    308     328     299     518      434
    ",header=T)
    
    names(df1) <- gsub("^X","",names(df1))
    df1
    #       0h_1 0h_2 0h_3 0h_4 1h_1 1h_2 1h_3 1h_4 2h_1 2h_2 2h_3 2h_4
    # gene1  434  123   42   94 9811  262  117   42  327  367  276  224
    # gene2   47  103   30  847   13  291  167  358  303  293 2263  741
    # gene3  322   27   97  217  223  243  328  308  328  299  518  434