I have a dataframe with 10 columns.
I need to
first sum the columns 1+2 and 3+4 together,
then to sum up the columns 5+6+7 and 8+9+10;
and save as a dataframe.
I have made it work (see below), but manually, using 2 sapply
functions, then t
and rbind
, and thus I seek more elegant solution.
The column names exhibit a clear pattern across them.
The list for the first 4 columns looks like this:
“on_b_, off_b_”
and repeat (thus I am summing up columns 1 & 2, and then 3 & 4)
The list for the next 6 columns looks like this:
“on_b_, something else in between, off_b_”
and repeat (thus I am summing up 5 & 6 & 7 and then 8 & 9 & 10)
I need the function to restart once it sees the beginning of the list again. The beginning one is the column with the name starting from on_b_
:
on_b_, (something else entirely irrelevant in between, like CC_, or nothing at all), off_b_
Here is the reproducible code chunk:
df = structure(list(on_b_pos_1 = 64, off_b_4 = 25,
on_b_6 = 28, off_b_8 = 157,
on_b_3 = 42, CC_2 = 0, off_b_4 = 125,
on_b_5 = 51, CC_7 = 0, off_b_8 = 15),
row.names = 14L, class = "data.frame")
This is what I have done before and it works:
# adding together TWO columns
a <- data.frame(sapply(seq(1,4,by=2),function(i) rowSums(df[,i:(i+1)])))
# check whether the function i is getting the correct columns where it restarts
# seq(1,4,by=2)
# adding together THREE columns
b <- data.frame(sapply(seq(5,ncol(df),by=3),function(i) rowSums(df[,i:(i+2)])))
# transpose
a <- t(a)
b <- t(b)
c <- cbind(a, b)
The result should look like this:
Column 1 Column 2 Column 3 Column 4
89 185 167 66
You can use the map2
function from the purrr
package, where you look for the column index that starts with "on" or "off" by grep
.
If you have more than 1 row in your df
, wrap my code with sapply
or map
to iterate over the rows.
library(purrr)
map2_int(grep("^on", colnames(df)),
grep("^off", colnames(df)),
~rowSums(df[, .x:.y]))
[1] 89 185 167 66