Search code examples
rdata-manipulationdata-extraction

Find rows that contain the same values across two or three columns


I want to find rows that contain the same values across two or three columns. Here is an example dataset:

replicate(3, {sample(1:3)})
     [,1] [,2] [,3]
[1,]    3    3    2
[2,]    2    1    1
[3,]    1    2    3

For this dataset, the first and the second row have duplicated values (i.e., 3 and 1) and therefore I want to extract and dispose them and later just keep the rows with the non-duplicated values (i.e., the third row in this case).

How to achieve that? I have a larger dataset. I appreciate for any help!


Solution

  • Using m in the Note at the end, apply anyDuplicated to each row and use that to subset the rows. anyDupolicated returns 0 if there are no duplicates and the index of the first duplicate otherwise. The exclamation mark (!) will coerce 0 to FALSE and other values as TRUE and then negate it.

    m[!apply(m, 1, anyDuplicated),, drop = FALSE ]
    ##      [,1] [,2] [,3]
    ## [1,]    1    2    3
    

    or

    subset(m, !apply(m, 1, anyDuplicated))
    ##      [,1] [,2] [,3]
    ## [1,]    1    2    3
    

    Note

    This is the same matrix as shown in the question but generated without using random numbers for reproducibility.

    m <- matrix(c(3, 2, 1, 3, 1, 2, 2, 1, 3), 3)