I have a pretty easy question that I cannot find a simple solution to myself. I have a data.frame of expression data. Each row corresponds to one measured gene. And the columns are measured expressions at different timepoints, where each timepoint has 4 replicates. It looks a bit like this:
0h_1 0h_2 0h_3 0h_4 1h_1 1h_2 1h_3 1h_4 2h_1 2h_2 2h_3 2h_4 3h_1 3h_2 3h_3 3h_4
gene1 434 123 42 94 9811 262 117 42 327 367 276 224
gene2 47 103 30 847 13 291 167 358 303 293 2263 741
gene3 322 27 97 217 223 243 328 308 328 299 518 434
I want to sum up all the replicates for each row, so that the result will have a row for each gene and just ONE column for each timepoint instead of FOUR. Is there any function that lets me do that efficiently?
For clarification: what I am looking for is a data.frame like this:
0h 1h 2h 3h ...
gene1 693 9811
gene2 1027 13
gene3
Thanks in advance. Best, Jonas
Here's an option in base R
:
res <- as.data.frame(lapply(split.default(df1, sub("_.*$","",names(df1))), rowSums))
names(res) <- gsub("^X","",names(res))
res
# 0h 1h 2h
# gene1 693 10232 1194
# gene2 1027 829 3600
# gene3 663 1102 1579
data
df1 <- read.table(text="
0h_1 0h_2 0h_3 0h_4 1h_1 1h_2 1h_3 1h_4 2h_1 2h_2 2h_3 2h_4
gene1 434 123 42 94 9811 262 117 42 327 367 276 224
gene2 47 103 30 847 13 291 167 358 303 293 2263 741
gene3 322 27 97 217 223 243 328 308 328 299 518 434
",header=T)
names(df1) <- gsub("^X","",names(df1))
df1
# 0h_1 0h_2 0h_3 0h_4 1h_1 1h_2 1h_3 1h_4 2h_1 2h_2 2h_3 2h_4
# gene1 434 123 42 94 9811 262 117 42 327 367 276 224
# gene2 47 103 30 847 13 291 167 358 303 293 2263 741
# gene3 322 27 97 217 223 243 328 308 328 299 518 434