Search code examples
rdataframeduplicates

Identification of duplicate pairs of values, when then their order is reversed


In

df <- data.frame(A = c(LETTERS[1:6], "A"),
                 B = c(rev(LETTERS[1:6]), "F"))

how can i count the amount of rows, which are not unique? The function needs to identify reverse order pairs.

In the example above, there arent any unique letter combinations (3 x A/F; 2 x B/E, 2 x C/D), so the answer is "0".

(Letters can be replaced by any string of characters or factor levels)


Solution

  • You can order the rows first (with sort), and then count (with table). I use paste0(x, collapse = "") to make a unique string out of the ordered pairs of values.

    (tab <- table(apply(df, 1, \(x) paste0(sort(x), collapse = ""))))
    # AF BE CD 
    #  3  2  2 
    
    sum(tab == 1)
    #[1] 0