Search code examples
rgroupmutate

Is there an R function that checks whether all values in a group are the same as all values in another group?


Data I have:

A B
1 a
2 c
2 e
3 f
4 h
5 c
5 e

What I want:

A B Group
1 a 1
2 c 2
2 e 2
3 f 3
4 h 4
5 c 2
5 e 2

Code I attempted:

library(readxl)
library(dplyr)
library(stringr)
data1 <- read_excel("testing.xlsx")
data2 <- data1 %>% 
  group_by(A) %>% 
  group_by(B) %>% 
  mutate(Group = cur_group_id()) %>% 
  ungroup()

What I’m getting from this code:

A B Group
1 a 1
2 c 2
2 e 3
3 f 4
4 h 5
5 c 2
5 e 3

EDIT: I get the error — “Can’t supply ‘.by’ when ‘.data’ is a grouped data frame.” for all of the comments below. The original data I am manipulating has been left-joined and then grouped. How do I approach this?


Solution

  • You can try below

    library(dplyr)
    df %>%
        left_join(
            (.) %>%
                summarise(group = as.factor(toString(sort(B))), .by = A) %>%
                mutate(group = as.integer(group))
        )
    

    or you can use membership from igraph package in addition

    library(dplyr)
    library(igraph)
    df %>%
        mutate(group = {
            (.) %>%
                graph_from_data_frame() %>%
                components() %>%
                membership()
        }[B])
    

    which gives

      A B group
    1 1 a     1
    2 2 c     2
    3 2 e     2
    4 3 f     3
    5 4 h     4
    6 5 c     2
    7 5 e     2
    

    bonus (for the igraph interest)

    df %>%
        graph_from_data_frame() %>%
        plot()
    

    shows the groups enter image description here