I want to find rows that contain the same values across two or three columns. Here is an example dataset:
replicate(3, {sample(1:3)})
[,1] [,2] [,3]
[1,] 3 3 2
[2,] 2 1 1
[3,] 1 2 3
For this dataset, the first and the second row have duplicated values (i.e., 3 and 1) and therefore I want to extract and dispose them and later just keep the rows with the non-duplicated values (i.e., the third row in this case).
How to achieve that? I have a larger dataset. I appreciate for any help!
Using m in the Note at the end, apply anyDuplicated to each row and use that to subset the rows. anyDupolicated returns 0 if there are no duplicates and the index of the first duplicate otherwise. The exclamation mark (!) will coerce 0 to FALSE and other values as TRUE and then negate it.
m[!apply(m, 1, anyDuplicated),, drop = FALSE ]
## [,1] [,2] [,3]
## [1,] 1 2 3
or
subset(m, !apply(m, 1, anyDuplicated))
## [,1] [,2] [,3]
## [1,] 1 2 3
This is the same matrix as shown in the question but generated without using random numbers for reproducibility.
m <- matrix(c(3, 2, 1, 3, 1, 2, 2, 1, 3), 3)