Search code examples
rdplyrcountcombinationspermutation

Count 2 factors regardless of order


How can I perform pairwise counts on 2 factor columns, regardless of order. Both columns contain many identical elements.

dplyr's group_by() %>% count() or group_by() %>% tally() functions perform permutation counts.

Is there an option or method to perform combination counts instead?

Input dataframe:

Factor1 <- c('A','A','B','B','C','B','D')
Factor2 <- c('B','B','A','C','B','B','E')
DF <- data.frame(Factor1,Factor2)

Desired result:

CoFactors <- c('AB','BC','BB','DE')
n <- c(3,2,1,1)
Result <- data.frame(CoFactors,n)

Solution

  • in Base R:

    data.frame(table(apply(DF, 1, function(x)paste0(sort(x), collapse = ''))))
      Var1 Freq
    1   AB    3
    2   BB    1
    3   BC    2
    4   DE    1
    

    or even:

    DF %>%
      mutate(Factor = pmin(Factor1, Factor2), 
             Factor2 = pmax(Factor1, Factor2)) %>%
      group_by(Factor, Factor2) %>%
      count()
    
    # A tibble: 4 x 3
    # Groups:   Factor, Factor2 [4]
      Factor Factor2     n
      <chr>  <chr>   <int>
    1 A      B           3
    2 B      B           1
    3 B      C           2
    4 D      E           1