Search code examples
rdata-manipulation

Flagging data ids based on a condition in R


I need a conditional modification to my dataset. Here is a sample dataset.

data <- data.frame(id = c(1,1,1,1,1,1, 2,2,2, 3,3,3),
                   cat1 = c("A","A","A","B","B","B", "A","A","A", "A","A","B"),
                   levels = c("L1","L3","L4","L2","L1","L3", "L1","L2","L2", "L1","L2","L1"))

> data
   id cat1 levels
1   1    A     L1
2   1    A     L3
3   1    A     L4
4   1    B     L2
5   1    B     L1
6   1    B     L3
7   2    A     L1
8   2    A     L2
9   2    A     L2
10  3    A     L1
11  3    A     L2
12  3    B     L1

a) For each id, if cat1 == "A" has L3 or L4, that id should have cat1 == "B". This is the main rule. [Rule_satisfied]

b) If cat1 == "A" has L1 or L2, that id should not have cat1 == "B" [Rule_NotSatisfied]

c) If cat1 == "A" has L1 or L2, that id has cat1 == "B", then this is a violation of the rule. [Rule_violation]

How can I get the desired output as below?

> data.1
   id cat1 levels                  label
1   1    A     L1         Rule_satisfied
2   1    A     L3         Rule_satisfied
3   1    A     L4         Rule_satisfied
4   1    B     L2         Rule_satisfied
5   1    B     L1         Rule_satisfied
6   1    B     L3         Rule_satisfied
7   2    A     L1      Rule_NotSatisfied
8   2    A     L2      Rule_NotSatisfied
9   2    A     L2      Rule_NotSatisfied
10  3    A     L1      Rule_violation
11  3    A     L2      Rule_violation
12  3    B     L1      Rule_violation

Solution

  • Perhaps this use of dplyr::group_by and dplyr::case_when.

    library(dplyr)
    data %>%
      group_by(id) %>%
      mutate(
        label = case_when(
          any(cat1 == "A" & levels %in% c("L3", "L4")) && "B" %in% cat1 ~ "Rule_satisfied",
          any(cat1 == "A" & levels %in% c("L1", "L2")) && !"B" %in% cat1 ~ "Rule_NotSatisfied",
          any(cat1 == "A" & levels %in% c("L1", "L2")) && "B" %in% cat1 ~ "Rule_violation"
        )
      ) %>%
      ungroup()
    # # A tibble: 12 × 4
    #       id cat1  levels label            
    #    <dbl> <chr> <chr>  <chr>            
    #  1     1 A     L1     Rule_satisfied   
    #  2     1 A     L3     Rule_satisfied   
    #  3     1 A     L4     Rule_satisfied   
    #  4     1 B     L2     Rule_satisfied   
    #  5     1 B     L1     Rule_satisfied   
    #  6     1 B     L3     Rule_satisfied   
    #  7     2 A     L1     Rule_NotSatisfied
    #  8     2 A     L2     Rule_NotSatisfied
    #  9     2 A     L2     Rule_NotSatisfied
    # 10     3 A     L1     Rule_violation   
    # 11     3 A     L2     Rule_violation   
    # 12     3 B     L1     Rule_violation