Search code examples
rdataframedata-manipulation

Waiting until value changes to make calculation


I have the following dataframe:

time <- c(1,2,3,4,5,6,7,8,9,10,11,12)
threshold <- c(1,1,1,0,0,2,2,1,1,2,2,0)
value <- c(5,3,2,4,6,9,1,10,3,5,2,4)

df <- data.frame(time, threshold, value)

The logic behind the threshold variable is that when there is a 1: store the first value (5) and subtract the number when the threshold changes (4). During the first threshold period of "1", the calculation would be 5 - 4 = 1, which should be stored in a new column at time t=4.

Threshold "0" means no calculation.

Threshold "2" means just the reverse of "1", where the the initial value 9 is subtracted by 10.

The goal would be a table like this: enter image description here

Is there a way to calculate perform this calculation in R?


Solution

  • If you use the rle function, you can focus on rows where there is a threshold change. Then, align the values to be subtracted and translate the threshold to a factor controlling the direction of the subtraction.

    #Select only rows with a threshold change,
    # based on https://stackoverflow.com/a/27482738/14027775
    compact_threshold <- rle(df$threshold)
    row_ids <- cumsum(c(1, compact_threshold$lengths[-length(compact_threshold$lengths)]))
    
    #Transform thresholds to a factor 
    # if the row is in a threshold change
    df$subFactor <- df$threshold
    df$subFactor[-row_ids] <- NA
    df$subFactor[df$subFactor==0] <- NA
    df$subFactor[df$subFactor==2] <- -1
    
    #Align each value with the one 
    # corresponding to the next threshold change
    df$value2 <- NA
    df$value2[row_ids] <- c(df$value[row_ids][-1], NA)
    
    df$calculation <- (df$value-df$value2)*df$subFactor
    #Shift results to the next threshold
    df$calculation[row_ids] <- c(NA, df$calculation[row_ids][-length(row_ids)])
    
    df <- df[,c("time", "threshold", "value", "calculation")]
    df
    #   time threshold value calculation
    #1     1         1     5          NA
    #2     2         1     3          NA
    #3     3         1     2          NA
    #4     4         0     4           1
    #5     5         0     6          NA
    #6     6         2     9          NA
    #7     7         2     1          NA
    #8     8         1    10           1
    #9     9         1     3          NA
    #10   10         2     5           5
    #11   11         2     2          NA
    #12   12         0     1          -4
    

    Raw data

    #Data
    time <- c(1,2,3,4,5,6,7,8,9,10,11,12)
    threshold <- c(1,1,1,0,0,2,2,1,1,2,2,0)
    value <- c(5,3,2,4,6,9,1,10,3,5,2,1)
    
    df <-  data.frame(time, threshold, value)
    df
    #   time threshold value
    #1     1         1     5
    #2     2         1     3
    #3     3         1     2
    #4     4         0     4
    #5     5         0     6
    #6     6         2     9
    #7     7         2     1
    #8     8         1    10
    #9     9         1     3
    #10   10         2     5
    #11   11         2     2
    #12   12         0     1