I am trying to find a way to report the maximum value of previous rows in groups of rows, using either tidyverse
or data.table
.
Let's say I have the following data frame, in which Start and End represent start and end frames of whichever behaviours:
Example <- data.frame(Group = c(rep(1, 6), rep(2, 6)),
Individual = c("A", "B", "C", "A", "B", "C", "A", "C", "B", "A", "B", "C"),
Start = c(15, 30, 35, 37, 50, 61, 66, 83, 88, 93, 108, 130),
End = c(26, 48, 41, 44, 72, 64, 71, 95, 91, 94, 115, 150))
I would like to create a new column, End_multi, which should take the maximum value of all previous rows in the same group.
I tried with data.table
. Here is the step-by-step code:
library(data.table)
Example_dt <- as.data.table(Example)
Example_dt2 <- copy(Example_dt)
Example_dt2[, End_multi := shift(End, 1:2, type = "lag"), by = Group]
Example_dt3 <- copy(Example_dt2)
Example_dt3[, End_multi := max(unlist(shift(End, 1:2, type = "lag")), na.rm = TRUE), by = Group]
I have several issues with this code:
The ideal output for me would be something like:
Example_desired <- data.frame(Group = c(rep(1, 6), rep(2, 6)),
Individual = c("A", "B", "C", "A", "B", "C", "A", "C", "B", "A", "B", "C"),
Start = c(15, 30, 35, 37, 50, 61, 66, 83, 88, 93, 108, 130),
End = c(26, 48, 41, 44, 72, 64, 71, 95, 91, 94, 115, 150),
End_multi = c(26, 48, 48, 48, 72, 72, 71, 95, 95, 95, 115, 150))
Do you know what I need to change in data.table
to achieve this or have another way of doing it in the tidyverse
?
As yuk mentioned, cummax is what you're looking for- here is a data.table solution:
setDT(Example)
#I would like to create a new column,
# End_multi, which should
# take the maximum value of
# all previous rows in the same group.
Example[, `:=`(Start_multi=cummax(Start),
End_multi=cummax(End)),
by = Group]
Example
# Group Individual Start End Start_multi End_multi
# <num> <char> <num> <num> <num> <num>
# 1: 1 A 15 26 15 26
# 2: 1 B 30 48 30 48
# 3: 1 C 35 41 35 48
# 4: 1 A 37 44 37 48
# 5: 1 B 50 72 50 72
# 6: 1 C 61 64 61 72
# 7: 2 A 66 71 66 71
# 8: 2 C 83 95 83 95
# 9: 2 B 88 91 88 95
#10: 2 A 93 94 93 95
#11: 2 B 108 115 108 115
#12: 2 C 130 150 130 150
Just make sure your data is sorted as you'd like it:
Example[order(Group, Start)]