I want to make an ID for group members that is sometimes common and sometimes dissimilar. I have a few cases where dissimilarity is needed, so I can specify those manually in the code or as a list.
Imagine data like this:
data <- data.frame(
Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
)
Basically, I want something where, say, Group 1 has a common ID so Alice = 1, Bob = 1, but then group 2 has globally unique IDs that vary within the group, so Charlie =2, David = 3, and Eve = 4, but then we switch back to similar for Group 3.
Final ideal data would hypothetically look like this:
data <- data.frame(
Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John"),
ID = c(1,1,2,3,4,5,5,6,6,6)
)
I am most familiar with dplyr, and so have been experimenting with
Data %>%
group_by(Group)%>%
mutate(ID = row_number())
And variations thereof including case_when and ifelse statements.
Ideal outcome would be code that uses case_when or ifelse in the mutate to let me specify n groups that should have different within group IDs and let something like the TRUE~ portion of the case when assign globally unique common IDs to the rest of the groups.
Perhaps this would suit your use-case?
library(dplyr, warn = FALSE)
data <- data.frame(
Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
)
data %>%
mutate(Group_number = row_number(), .by = Group) %>%
mutate(tmp = case_when(Group == "Group2" ~ 1,
Group == "Group4" ~ 1,
Group_number == 1 ~ 1,
TRUE ~ 0)) %>%
mutate(ID = cumsum(tmp)) %>%
select(-c(Group_number, tmp))
#> Group Member ID
#> 1 Group1 Alice 1
#> 2 Group1 Bob 1
#> 3 Group2 Charlie 2
#> 4 Group2 David 3
#> 5 Group2 Eve 4
#> 6 Group3 Frank 5
#> 7 Group3 Grace 5
#> 8 Group4 Helen 6
#> 9 Group4 Ivy 7
#> 10 Group4 John 8
Created on 2023-10-18 with reprex v2.0.2
EDIT: based on the comment below, only Group2 should be 'dissimilar':
library(dplyr, warn = FALSE)
data <- data.frame(
Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
)
data %>%
mutate(Group_number = row_number(), .by = Group) %>%
mutate(tmp = case_when(Group == "Group2" ~ 1,
# Group == "Group4" ~ 1,
Group_number == 1 ~ 1,
TRUE ~ 0)) %>%
mutate(ID = cumsum(tmp)) %>%
select(-c(Group_number, tmp))
#> Group Member ID
#> 1 Group1 Alice 1
#> 2 Group1 Bob 1
#> 3 Group2 Charlie 2
#> 4 Group2 David 3
#> 5 Group2 Eve 4
#> 6 Group3 Frank 5
#> 7 Group3 Grace 5
#> 8 Group4 Helen 6
#> 9 Group4 Ivy 6
#> 10 Group4 John 6
Created on 2023-10-18 with reprex v2.0.2