Search code examples
rdplyrgroup-bycasemutate

group ids some common some dissimilar R


I want to make an ID for group members that is sometimes common and sometimes dissimilar. I have a few cases where dissimilarity is needed, so I can specify those manually in the code or as a list.

Imagine data like this:

data <- data.frame(
  Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
  Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
)

Basically, I want something where, say, Group 1 has a common ID so Alice = 1, Bob = 1, but then group 2 has globally unique IDs that vary within the group, so Charlie =2, David = 3, and Eve = 4, but then we switch back to similar for Group 3.

Final ideal data would hypothetically look like this:

data <- data.frame(
  Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
  Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John"),
ID = c(1,1,2,3,4,5,5,6,6,6)
)

I am most familiar with dplyr, and so have been experimenting with

Data %>%
group_by(Group)%>%
mutate(ID = row_number())

And variations thereof including case_when and ifelse statements.

Ideal outcome would be code that uses case_when or ifelse in the mutate to let me specify n groups that should have different within group IDs and let something like the TRUE~ portion of the case when assign globally unique common IDs to the rest of the groups.


Solution

  • Perhaps this would suit your use-case?

    library(dplyr, warn = FALSE)
    
    data <- data.frame(
      Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
      Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
    )
    
    data %>%
      mutate(Group_number = row_number(), .by = Group) %>%
      mutate(tmp = case_when(Group == "Group2" ~ 1,
                             Group == "Group4" ~ 1,
                             Group_number == 1 ~ 1,
                             TRUE ~ 0)) %>%
      mutate(ID = cumsum(tmp)) %>%
      select(-c(Group_number, tmp))
    #>     Group  Member ID
    #> 1  Group1   Alice  1
    #> 2  Group1     Bob  1
    #> 3  Group2 Charlie  2
    #> 4  Group2   David  3
    #> 5  Group2     Eve  4
    #> 6  Group3   Frank  5
    #> 7  Group3   Grace  5
    #> 8  Group4   Helen  6
    #> 9  Group4     Ivy  7
    #> 10 Group4    John  8
    

    Created on 2023-10-18 with reprex v2.0.2


    EDIT: based on the comment below, only Group2 should be 'dissimilar':

    library(dplyr, warn = FALSE)
    
    data <- data.frame(
      Group = c("Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group4", "Group4", "Group4"),
      Member = c("Alice", "Bob", "Charlie", "David", "Eve", "Frank", "Grace", "Helen", "Ivy", "John")
    )
    
    data %>%
      mutate(Group_number = row_number(), .by = Group) %>%
      mutate(tmp = case_when(Group == "Group2" ~ 1,
                            # Group == "Group4" ~ 1,
                             Group_number == 1 ~ 1,
                             TRUE ~ 0)) %>%
      mutate(ID = cumsum(tmp)) %>%
      select(-c(Group_number, tmp))
    #>     Group  Member ID
    #> 1  Group1   Alice  1
    #> 2  Group1     Bob  1
    #> 3  Group2 Charlie  2
    #> 4  Group2   David  3
    #> 5  Group2     Eve  4
    #> 6  Group3   Frank  5
    #> 7  Group3   Grace  5
    #> 8  Group4   Helen  6
    #> 9  Group4     Ivy  6
    #> 10 Group4    John  6
    

    Created on 2023-10-18 with reprex v2.0.2