Search code examples
rtraminer

Apply a rule based on row values of a dataset


I'm a starter in R, I already developed some programs, but the issue that I will expose you never happened to me yet. Here is the TSE dataframe I'm dealing with :

   ID TIME EVENT
1 150    1     A
2 150    2     B
3 150    2     C
4 150    2     D
5 151    1     C
6 151    2     A
7 151    3     B
8 151    3     D

This dataframe contains 3 variables :

ID : Id of the person,

TIME : Time index,

EVENT: An event that occurs at a certain moment of time.

I want to drop row(s) for which two or more events occur at the same time value (TIME) based on a rule. Let's suppose the rule is : C>B>A>D (where ">" means preferable)

So, in my example, I would like to keep only these rows :

   ID TIME EVENT
1 150    1     A
3 150    2     C
5 151    1     C
6 151    2     A
7 151    3     B

You can easily see that rows 2,4,8 vanished because of the defined rule

I guess this shouldn't be so tricky to program but I really have no clue on how to put it down.

Thanks you all in anticipation.

Jérémie P.


Solution

  • Here's a possible solution using dplyr.

    First reproduce your data

    DF <- data.frame(ID = rep(150:151, each=4), 
                     time=c(1, 2, 2, 2, 1, 2, 3, 3), 
                     EVENT=c("A", "B", "C", "D", "C", "A", "B", "D"))
    
    target_rule <- c("C", "B", "A", "D")
    

    Then we can use a combination of commands from dplyr to order, select, etc. Below I use a factor version of your EVENT to sort them according to your taget rule.

    library("dplyr")
    DF %>% 
      group_by(ID, time) %>%                               # Consider each combo of ID and time    
      mutate(fevent=factor(EVENT, levels=target_rule)) %>% # Create ordered version of EVENT 
      arrange(fevent) %>%                                  # Sort according to rule
      summarise(EVENT=first(EVENT)) %>%                    # Pick just the first 
      ungroup() %>% 
      arrange(ID) 
    

    This produces

    # A tibble: 5 x 5
         ID  time EVENT fevent    rn
      <int> <dbl> <fct> <fct>  <int>
    1   150     1 A     A          1
    2   150     2 C     C          1
    3   151     1 C     C          1
    4   151     2 A     A          1
    5   151     3 B     B          1