I have a dataset with two colums, ID, Group.
ID Group
10 Red
11 Red
13 Blue
13 Red
15 Blue
15 Blue
17 Blue
17 Red
17 Red
19 Blue
19 Undecided
23 Blue
23 Undecided
23 Undecided
My goal is to create a third indicator column that takes a value based on the combination of values in ID and Group. Expecting a third column , Indicator
like this.
ID Group Indicator
10 Blue Single
11 Blue Single
13 Blue Blue-Red
13 Red Blue-Red
15 Blue Blue-Blue
15 Blue Blue-Blue
17 Blue Blue-Red
17 Red Blue-Red
17 Red Blue-Red
19 Blue Blue-Undecided
19 Undecided Blue-Undecided
23 Blue Blue-Undecided
23 Undecided Blue-Undecided
23 Undecided Blue-Undecided
I tried case_when
but it became too complicated and gave up. Any help in much appreciated. Thanks.
Here's a case_when
version using dplyr
that doesn't have too many cases
library(dplyr)
dd %>%
mutate(Indicator = case_when(
n()==1 ~ "Single",
n_distinct(Group)==1 ~ paste(rep(first(Group),2), collapse="-"),
TRUE ~ paste(sort(unique(Group)), collapse="-")
), .by=ID)
which produces
ID Group Indicator
1 10 Red Single
2 11 Red Single
3 13 Blue Blue-Red
4 13 Red Blue-Red
5 15 Blue Blue-Blue
6 15 Blue Blue-Blue
7 17 Blue Blue-Red
8 17 Red Blue-Red
9 17 Red Blue-Red
10 19 Blue Blue-Undecided
11 19 Undecided Blue-Undecided
12 23 Blue Blue-Undecided
13 23 Undecided Blue-Undecided
14 23 Undecided Blue-Undecided
tested with
dd <- data.frame(
ID = c(10L, 11L, 13L, 13L, 15L, 15L, 17L, 17L, 17L, 19L, 19L, 23L, 23L, 23L),
Group = c("Red", "Red", "Blue", "Red", "Blue", "Blue", "Blue", "Red", "Red", "Blue", "Undecided", "Blue", "Undecided", "Undecided")
)