I got a CSV file that look like this:
what i want to get is this:
the activity instance is needed to identifiy which events belong together and which not. this instance identifier should be unique, also among different cases and activities. I have no idea how to generate those ID's. Is there any library for example in python who could handle this?
In R you could try the following using dplyr
.
Using arrange
you can ensure your data is by patient
and in chronological order. Then the activity_instance
will be a number incremented when the patient
or activity
changes going from row to row.
library(dplyr)
df %>%
arrange(patient, timestamp) %>%
mutate(activity_instance = 1 + cumsum(
(patient != lag(patient, default = first(patient)) |
activity != lag(activity, default = first(activity)))))