In a dataset in which I have an activity, start and stop times, and an id, I want to merge values of two rows in the same column and update other columns if several conditions apply. First a data example:
library(data.table)
DT <- data.table(person=c(1,1,1,1,2,2,2,2,2,3,3,3,3),
activity=c("grab", "walk", "remove", "delete", "run", "talk", "walk", "remove",
"grab", "walk", "delete", "talk", "remove"),
start_time=c(0,1,3,6,0,2,2,3,3,3,6,6,7), stop_time=c(1,3,5,7,1,4,4,8,4,5,7,7,8))
DT
I want to update start and stop times and merge column 'activity' for each person if:
The updated row should reflect the start and stop times of the combined activity, and all rows, except the one updated should be removed. Below is the goal I want to achieve with the data sample I provided:
DT.goal <- data.table(person=c(1,1,2,2,3,3),
activity=c("grab + walk + remove", "delete", "run", "talk + walk + grab + remove",
"walk", "delete + talk + remove"),
start_time=c(0,6,0,2,3,6), stop_time=c(5,7,1,8,5,8))
DT.goal
So far I've come up with the following incomplete attempt:
DT.test <- DT[start_time <= shift(stop_time, 1L, type="lag"),
cond := T, by=person]
DT.test <- DT.test[cond==T,
new_activity := paste(activity, shift(activity, 1L, type="lag")), by=person]
DT.test <- DT.test[, new_start := start_time, by=person][cond==T, new_start := min(start_time), by=person]
DT.test <- DT.test[, new_stop := stop_time, by=person][cond==T, new_stop := max(stop_time), by=person]
However, using the shift(, type="lag)
is not very useful for the first row for each person, as it has no previous row to look at. Also, paste()
pastes NA if the condition does not evaluate to TRUE
.
Can anyone help me on my way?
Please check the following:
library(data.table)
DT <- data.table(person=c(1,1,1,1,2,2,2,2,2,3,3,3,3),
activity=c("grab", "walk", "remove", "delete", "run", "talk", "walk", "remove",
"grab", "walk", "delete", "talk", "remove"),
start_time=c(0,1,3,6,0,2,2,3,3,3,6,6,7), stop_time=c(1,3,5,7,1,4,4,8,4,5,7,7,8))
setorder(DT, person, start_time)
DT[, concatenate := start_time %in% stop_time | stop_time %in% start_time | duplicated(start_time) | duplicated(start_time, fromLast=TRUE), by = "person"]
DT[, concatenate_grp := rleid(concatenate), by = "person"]
DT[, paste(activity, collapse = " + "), by = c("person", "concatenate_grp")]
DT.goal <- DT[, .(activity = paste(activity, collapse = " + "), start_time = min(start_time), stop_time = max(stop_time)), by = c("person", "concatenate_grp")][, concatenate_grp := NULL]
Which results in:
person activity start_time stop_time
1: 1 grab + walk + remove 0 5
2: 1 delete 6 7
3: 2 run 0 1
4: 2 talk + walk + remove + grab 2 8
5: 3 walk 3 5
6: 3 delete + talk + remove 6 8