Search code examples
rcombinationstidyverse

Creating a df of unique combinations of columns in R where order doesn't matter


I want to create a df with all of the unique combinations of three columns where the order of the value doesn't matter. In my example, I want to create a list of all the combinations of ideology groups of three people could have.

In my example, "No opinion", "Moderate", "Conservative" is the same as "Conservative" "No opinion" "Moderate" which is the same as "Moderate", "No opinion", "Conservative", etc. all of these combinations should be represented by one row.

I've seen similar threads about using distinct for home and away sports teams, but I don't think this is working for this problem.

library(tidyverse)

political_spectrum_values = 
  factor(c("Far left",
           "Liberal",
           "Moderate", 
           "Conservative",
           "Far right",
           "No opinion"), 
           ordered = T)


political_groups_of_3 <- 
crossing(first_person = political_spectrum_values, 
         second_person = political_spectrum_values,
         third_person = political_spectrum_values)

I've considered making some kind of combined variable by piping into this line, but I'm not sure how to take it from here

unite(col = "group_composition", c(first_person, second_person, third_person), sep = "_")

EDIT: After working with this problem longer I've reshaped the data in a way that might make this easier

crossing(first_person = political_spectrum_values, 
         second_person = political_spectrum_values,
         third_person = political_spectrum_values) %>% 
  mutate(group_n = row_number()) %>% 
  pivot_longer(cols = c(first_person, second_person, third_person), 
               values_to = "ideology", 
               names_to = "group") %>% 
  select(-group)

Solution

  • A base R method is to create all the combination of political_spectrum_values taking 3 at a time using expand.grid, sort them by row and select unique rows.

    df <- expand.grid(first_person = political_spectrum_values, 
                      second_person = political_spectrum_values, 
                      third_person = political_spectrum_values)
    
    df[] <- t(apply(df, 1, sort))
    unique(df)
    

    If needed as a single string

    unique(apply(df, 1, function(x) paste0(sort(x), collapse = "_")))