I need a conditional modification to my dataset. Here is a sample dataset.
data <- data.frame(id = c(1,1,1,1,1,1, 2,2,2, 3,3,3),
cat1 = c("A","A","A","B","B","B", "A","A","A", "A","A","B"),
levels = c("L1","L3","L4","L2","L1","L3", "L1","L2","L2", "L1","L2","L1"))
> data
id cat1 levels
1 1 A L1
2 1 A L3
3 1 A L4
4 1 B L2
5 1 B L1
6 1 B L3
7 2 A L1
8 2 A L2
9 2 A L2
10 3 A L1
11 3 A L2
12 3 B L1
a) For each id
, if cat1 == "A"
has L3
or L4
, that id
should have cat1 == "B"
.
This is the main rule. [Rule_satisfied
]
b) If cat1 == "A"
has L1
or L2
, that id
should not have cat1 == "B"
[Rule_NotSatisfied
]
c) If cat1 == "A"
has L1
or L2
, that id
has cat1 == "B"
, then this is a violation of the rule. [Rule_violation
]
How can I get the desired output as below?
> data.1
id cat1 levels label
1 1 A L1 Rule_satisfied
2 1 A L3 Rule_satisfied
3 1 A L4 Rule_satisfied
4 1 B L2 Rule_satisfied
5 1 B L1 Rule_satisfied
6 1 B L3 Rule_satisfied
7 2 A L1 Rule_NotSatisfied
8 2 A L2 Rule_NotSatisfied
9 2 A L2 Rule_NotSatisfied
10 3 A L1 Rule_violation
11 3 A L2 Rule_violation
12 3 B L1 Rule_violation
Perhaps this use of dplyr::group_by
and dplyr::case_when
.
library(dplyr)
data %>%
group_by(id) %>%
mutate(
label = case_when(
any(cat1 == "A" & levels %in% c("L3", "L4")) && "B" %in% cat1 ~ "Rule_satisfied",
any(cat1 == "A" & levels %in% c("L1", "L2")) && !"B" %in% cat1 ~ "Rule_NotSatisfied",
any(cat1 == "A" & levels %in% c("L1", "L2")) && "B" %in% cat1 ~ "Rule_violation"
)
) %>%
ungroup()
# # A tibble: 12 × 4
# id cat1 levels label
# <dbl> <chr> <chr> <chr>
# 1 1 A L1 Rule_satisfied
# 2 1 A L3 Rule_satisfied
# 3 1 A L4 Rule_satisfied
# 4 1 B L2 Rule_satisfied
# 5 1 B L1 Rule_satisfied
# 6 1 B L3 Rule_satisfied
# 7 2 A L1 Rule_NotSatisfied
# 8 2 A L2 Rule_NotSatisfied
# 9 2 A L2 Rule_NotSatisfied
# 10 3 A L1 Rule_violation
# 11 3 A L2 Rule_violation
# 12 3 B L1 Rule_violation