Search code examples
rloopsconditional-statementssumifsstep-through

In R, check multiple values multiple times, sum if conditions are met, keep track of position, export data, and repeat


Here is a sampling of my data, which is in a matrix:

BLOCK  RUNTIME
101    50
101    20
101    -25
101    -40
101    35
101    45
202    25
202    -10
202    -35
202    40
202    50
202    30
202    -20
202    15
.
.
.
n

The desired output of my yet-to-be code is this:

BLOCK  TIME_CHUNKS
101    70
101    -65
101    80
202    25
202    -45
202    120
202    -20
202    15

What I want the code to do to get the output: for as long as subsequent rows' BLOCK values are equal to the current row's BLOCK value, AND for as long as subsequent rows' RUNTIME values are the same sign, sum them, and populate a new row in a table with the BLOCK value and the summed value. Then proceed where you left off. In the example data, the first two rows (50, 20) are summed and written to a table. Then the sign of RUNTIME switches and is constant for two rows (-25, -40), so these values are summed. Then the sign of RUNTIME switches again and is positive for three rows (35, 45, 25), but the last row of these three has a different BLOCK number, so only the first two of these three rows are summed and written to the table. Proceed until the end of the matrix is reached.

It's easy enough for me to make the conditional statements, but I don't know how to "keep track" of where I am in the matrix to restart the summing process. I'm not that well-versed in for or while loops to even make an educated guess at how they could be used here. I made a pathetic attempt at writing a function, but didn't get anywhere.

Need to automate this process...I've got about 10,000 rows of data, which can be dynamically produced based on a set of scenario variables. Each set of variables will produce a table with different values for RUNTIME, and I've got a lot of scenarios to run.

Much appreciation for any help.


Solution

  • TIME_CHUNKS <- with(df, tapply(RUNTIME, BLOCK, function(x)
      tapply(x, cumsum(c(1, diff(sign(x)) != 0)), sum)))
    
    out <- data.frame(BLOCK = rep.int(unique(df$BLOCK), sapply(TIME_CHUNKS, length)),
                      TIME_CHUNKS = unlist(TIME_CHUNKS), row.names = NULL)
    

    Example

    Use this data and paste the above code

     df <- data.frame(BLOCK = c(101, 101, 101, 101, 101, 101, 202, 202,
     202, 202, 202, 202, 202, 202), RUNTIME = c(50, 20, -25, -40,
     35, 45, 25, -10, -35, 40, 50, 30, -20, 15))
    

    and you will get out to be:

     BLOCK TIME_CHUNKS
       101          70
       101         -65
       101          80
       202          25
       202         -45
       202         120
       202         -20
       202          15
    

    Explanation

    Notice the nested use of tapply. In words, what the above code is doing to calculate TIME_CHUNKS is:

    1. Split up each row by BLOCK, and let x be the vectors for a given block (for example, x will at first be c(50, 20, -25, -40, 35, 45).

    2. The quizzical looking cumsum(c(1, diff(sign(x)) != 0)) simply sub-divides our block into consecutive groups of same-signed numbers. Namely, diff(sign(x)) != 0 gives a vector of TRUEs and FALSEs according as the sign switches or not, and coercing to integer in combination with cumsum yields a vector that gives a different number for each sub-sequence with the same sign. Applying a sum along each subsequence gives the result we're looking for.