I have a dataset as follows:
Apr May Jun Jul Aug Sep Oct Nov b
1.0 9.0 4.0 5.3 6.4 3.4 2.5 4.3 2
5.0 6.0 9.0 2.3 5.8 2.3 6.5 5.2 3
8.0 4.0 6.0 0.7 5.2 1.2 2.2 6.1 4
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 7
3.2 3.2 3.2 3.2 3.2 3.2 3.2 3.2 8
4.4 4.1 5.1 6.1 7.1 8.1 9.1 6.8 6
5.6 5.0 3.2 4.2 5.2 1.2 2.2 3.2 5
6.8 5.9 8.9 2.3 3.3 5.7 4.7 3.7 5
8.0 6.8 9.8 4.8 5.8 6.8 7.8 8.8 5
9.2 7.7 7.7 2.8 3.8 4.8 5.8 6.8 6
I want to add a column sum data$sum=rowSums(data[data$b:8])
. But getting a warning `numerical expression has 2124 elements: only the first used. Please let me know a better method.
Here's a solution based on your comments:
data$sum <- NA # important to create the column before the for loop
for (rowIdx in 1:nrow(data)) {
startCol <- data[rowIdx, "b"]
data[rowIdx, "sum"] <- sum(data[rowIdx, startCol:8])
}
You need to use a for loop / apply statement to achieve this because you cannot specify a different starting column for each row using the [
subset operator.
Two things can happen when you use []
without a comma depending on your data structure:
If data
is a matrix
it will treat the entire matrix as a single vector, where each column occurs one after another. For example, data[1:15]
will return the 10 values in the "Apr" column then the first 5 values in the "May" column.
If data
is a data.frame
it will use the indices to look up columns. That is data[1:5]
is the same as data[,1:5]
. The reason for this is that a data.frame
is really a list()
underneath the hood, where each column is an element of the list()
.