Search code examples
rdata.table

How to report the maximum value of only previous rows in a group of rows in R?


I am trying to find a way to report the maximum value of previous rows in groups of rows, using either tidyverse or data.table.

Let's say I have the following data frame, in which Start and End represent start and end frames of whichever behaviours:

Example <- data.frame(Group = c(rep(1, 6), rep(2, 6)),
                      Individual = c("A", "B", "C", "A", "B", "C", "A", "C", "B", "A", "B", "C"),
                      Start = c(15, 30, 35, 37, 50, 61, 66, 83, 88, 93, 108, 130),
                      End = c(26, 48, 41, 44, 72, 64, 71, 95, 91, 94, 115, 150))

I would like to create a new column, End_multi, which should take the maximum value of all previous rows in the same group.

I tried with data.table. Here is the step-by-step code:

library(data.table)

Example_dt <- as.data.table(Example)

Example_dt2 <- copy(Example_dt)
Example_dt2[, End_multi := shift(End, 1:2, type = "lag"), by = Group]

Example_dt3 <- copy(Example_dt2)
Example_dt3[, End_multi := max(unlist(shift(End, 1:2, type = "lag")), na.rm = TRUE), by = Group]

I have several issues with this code:

  • it does not start from the start of the group of rows, as it only takes the last 2 rows into account.
  • when reporting the maximum value, it reports the overall maximum of the lagged values of the group but I would like to have the maximum value in the lagged rows up to the current row.

The ideal output for me would be something like:

Example_desired <- data.frame(Group = c(rep(1, 6), rep(2, 6)),
                              Individual = c("A", "B", "C", "A", "B", "C", "A", "C", "B", "A", "B", "C"),
                              Start = c(15, 30, 35, 37, 50, 61, 66, 83, 88, 93, 108, 130),
                              End = c(26, 48, 41, 44, 72, 64, 71, 95, 91, 94, 115, 150),
                              End_multi = c(26, 48, 48, 48, 72, 72, 71, 95, 95, 95, 115, 150))

Do you know what I need to change in data.table to achieve this or have another way of doing it in the tidyverse?


Solution

  • As yuk mentioned, cummax is what you're looking for- here is a data.table solution:

    setDT(Example)
    
    
    #I would like to create a new column, 
    # End_multi, which should 
    # take the maximum value of 
    # all previous rows in the same group.
    
    Example[, `:=`(Start_multi=cummax(Start),
                   End_multi=cummax(End)),
            by = Group]
    Example
    #    Group Individual Start   End Start_multi End_multi
    #    <num>     <char> <num> <num>       <num>     <num>
    # 1:     1          A    15    26          15        26
    # 2:     1          B    30    48          30        48
    # 3:     1          C    35    41          35        48
    # 4:     1          A    37    44          37        48
    # 5:     1          B    50    72          50        72
    # 6:     1          C    61    64          61        72
    # 7:     2          A    66    71          66        71
    # 8:     2          C    83    95          83        95
    # 9:     2          B    88    91          88        95
    #10:     2          A    93    94          93        95
    #11:     2          B   108   115         108       115
    #12:     2          C   130   150         130       150
    

    Just make sure your data is sorted as you'd like it: Example[order(Group, Start)]