Search code examples
pythonrcsvuuidevent-log

create activity_instance for Eventlogs


I got a CSV file that look like this:

enter image description here

what i want to get is this:

enter image description here

the activity instance is needed to identifiy which events belong together and which not. this instance identifier should be unique, also among different cases and activities. I have no idea how to generate those ID's. Is there any library for example in python who could handle this?


Solution

  • In R you could try the following using dplyr.

    Using arrange you can ensure your data is by patient and in chronological order. Then the activity_instance will be a number incremented when the patient or activity changes going from row to row.

    library(dplyr)
    
    df %>%
      arrange(patient, timestamp) %>%
      mutate(activity_instance = 1 + cumsum(
        (patient != lag(patient, default = first(patient)) |
         activity != lag(activity, default = first(activity)))))