Search code examples
rgroup-bytail

How to return the last matching condition row per group?


I am working on a file with many machinas. Each machina works during a cycle (which goes from -1 to -1) and I have some operation parameters:

1: High power operation 0.5: Low power operation 0: No operation -1: End of operation

I have a list of my machinas consisting of these kind of dataframe (one per machina - this one is a really small sample) :

    *Indx*  *N°1 Operation*  *N°1 Operation length*
       1           1                450
       1          0.5                84
       1           0                 48
       1           1                  4
       1          0.5                 4
       1           1                123
       1          0.5                14
       1          -1                 45
       2           1                471
       2           0                 47
       2          0.5                44
       2           0                145
       2          0.5                78
       2           1                 71
       2          0.5                19
       2           0                  2
       2          -1                 45

I would like to get for each group the last row with 1 value (high power operation). My purpose is then to sum the lengths from this last High operation value to end of cycle.

Desired output :

*Indx*    *N°1 Operation length*
  1                  123+14
  2                  71+19+2

How can I get to that ?


Solution

  • Using dplyr one way is to filter the End of operation rows from the data, group_by Indx and sum the Operation2 values which occur between last occurrence when Operation1 is 1 till the last row. We find the last occurrence of 1 using cumsum value.

    library(dplyr)
    
    df %>%
      filter(Operation1 != -1) %>%
      group_by(Indx) %>%
      summarise(Oplength = sum(Operation2[cumsum(Operation1 == 1) == 
                                      max(cumsum(Operation1 == 1))]))
    
    # A tibble: 2 x 2
    #   Indx Oplength
    #  <int>    <int>
    #1     1      137
    #2     2       92
    

    Or another way to find last occurrence is using which and max

    df %>%
      filter(Operation1 != -1) %>%
      group_by(Indx) %>%
      summarise(Oplength = sum(Operation2[max(which(Operation1 == 1)) : n()]))
    

    data

    df <- structure(list(Indx = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L), Operation1 = c(1, 0.5, 0, 1, 0.5, 
    1, 0.5, -1, 1, 0, 0.5, 0, 0.5, 1, 0.5, 0, -1), Operation2 = c(450L, 
    84L, 48L, 4L, 4L, 123L, 14L, 45L, 471L, 47L, 44L, 145L, 78L, 
    71L, 19L, 2L, 45L)), class = "data.frame", row.names = c(NA, -17L))