Search code examples
rdatatablesumrowmultiple-columns

Warning numerical expression has >1 elements: only the first used


I have a dataset as follows:

Apr May Jun Jul Aug Sep Oct Nov b
1.0 9.0 4.0 5.3 6.4 3.4 2.5 4.3 2
5.0 6.0 9.0 2.3 5.8 2.3 6.5 5.2 3
8.0 4.0 6.0 0.7 5.2 1.2 2.2 6.1 4
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 7
3.2 3.2 3.2 3.2 3.2 3.2 3.2 3.2 8
4.4 4.1 5.1 6.1 7.1 8.1 9.1 6.8 6
5.6 5.0 3.2 4.2 5.2 1.2 2.2 3.2 5
6.8 5.9 8.9 2.3 3.3 5.7 4.7 3.7 5
8.0 6.8 9.8 4.8 5.8 6.8 7.8 8.8 5
9.2 7.7 7.7 2.8 3.8 4.8 5.8 6.8 6

I want to add a column sum data$sum=rowSums(data[data$b:8]). But getting a warning `numerical expression has 2124 elements: only the first used. Please let me know a better method.


Solution

  • Here's a solution based on your comments:

    data$sum <- NA # important to create the column before the for loop
    for (rowIdx in 1:nrow(data)) {
       startCol <- data[rowIdx, "b"]
       data[rowIdx, "sum"] <-  sum(data[rowIdx, startCol:8])
    }
    

    You need to use a for loop / apply statement to achieve this because you cannot specify a different starting column for each row using the [ subset operator.

    Two things can happen when you use [] without a comma depending on your data structure:

    • If data is a matrix it will treat the entire matrix as a single vector, where each column occurs one after another. For example, data[1:15] will return the 10 values in the "Apr" column then the first 5 values in the "May" column.

    • If data is a data.frame it will use the indices to look up columns. That is data[1:5] is the same as data[,1:5]. The reason for this is that a data.frame is really a list() underneath the hood, where each column is an element of the list().