Search code examples
rfrequency

Row frequency of a data frame ignoring column order in R


I want to build a frequency table for the rows of a data frame.

I have found how to do it but taking in consideration the order of the columns. I wish to find the frequencies ignoring the columns order.

As an example for:

0   A       B     
1   B       A     
2   C       D      
3   D       C     
4   C       D

I wish to obtain:

A B 2
C D 3

Thanks in advance.


Solution

  • We can use pmin/pmax to create the grouping variable and should be more efficient

    library(dplyr)
    df %>%
       count(V2N = pmin(V2, V3), V3N = pmax(V2, V3))
    # A tibble: 2 x 3
    #  V2N   V3N       n
    #   <chr> <chr> <int>
    #1 A     B         2
    #2 C     D         3
    

    Benchmarks

    df1 <- df[rep(seq_len(nrow(df)), 1e6),]
    system.time({
    
    df1 %>%
           count(V2N = pmin(V2, V3), V3N = pmax(V2, V3))
    
     })
    #user  system elapsed 
    #  1.164   0.043   1.203 
    
    
    system.time({
    df2 <- data.frame(t(apply(df1[-1], 1, sort)))
    
    df2 %>%
       group_by_all() %>%
       summarise(Freq = n())
       
       })
       
    #   user  system elapsed 
    # 160.357   1.227 161.544 
    

    data

    df <- structure(list(V1 = 0:4, V2 = c("A", "B", "C", "D", "C"), V3 = c("B", 
      "A", "D", "C", "D")), row.names = c(NA, -5L), class = "data.frame")