Search code examples
rdataframedplyrsapply

Sum across multiple columns in R with changing number of columns (2 to 3) using the column names pattern


I have a dataframe with 10 columns.

I need to

  • first sum the columns 1+2 and 3+4 together,

  • then to sum up the columns 5+6+7 and 8+9+10;

  • and save as a dataframe.

I have made it work (see below), but manually, using 2 sapply functions, then t and rbind, and thus I seek more elegant solution.

The column names exhibit a clear pattern across them.

The list for the first 4 columns looks like this: “on_b_, off_b_” and repeat (thus I am summing up columns 1 & 2, and then 3 & 4)

The list for the next 6 columns looks like this: “on_b_, something else in between, off_b_” and repeat (thus I am summing up 5 & 6 & 7 and then 8 & 9 & 10)

I need the function to restart once it sees the beginning of the list again. The beginning one is the column with the name starting from on_b_:

on_b_, (something else entirely irrelevant in between, like CC_, or nothing at all), off_b_

Here is the reproducible code chunk:

df = structure(list(on_b_pos_1 = 64, off_b_4 = 25, 
                    on_b_6 = 28, off_b_8 = 157, 
                    on_b_3 = 42, CC_2 = 0,  off_b_4 = 125, 
                    on_b_5 = 51, CC_7 = 0, off_b_8 = 15), 
               row.names = 14L, class = "data.frame")

This is what I have done before and it works:

# adding together TWO columns
a <- data.frame(sapply(seq(1,4,by=2),function(i) rowSums(df[,i:(i+1)])))

# check whether the function i is getting the correct columns where it restarts
# seq(1,4,by=2) 

# adding together THREE columns
b <- data.frame(sapply(seq(5,ncol(df),by=3),function(i) rowSums(df[,i:(i+2)])))

# transpose
a <- t(a)
b <- t(b)

c <- cbind(a, b)

The result should look like this:

Column 1  Column 2  Column 3  Column 4
  89        185       167        66

Solution

  • You can use the map2 function from the purrr package, where you look for the column index that starts with "on" or "off" by grep.

    If you have more than 1 row in your df, wrap my code with sapply or map to iterate over the rows.

    library(purrr)
    
    map2_int(grep("^on", colnames(df)), 
             grep("^off", colnames(df)), 
             ~rowSums(df[, .x:.y]))
    
    [1]  89 185 167  66