Search code examples
rdataframecriteria

Non-unique results matching criteria for a data.frame in R


I am using R to pull a set of rows from a data frame. Many of the rows are pulled repeatedly. The rows are chosen using two criteria. Unfortunately the results are yielding a unique set of rows matching the criteria. I shall demonstrate...

Given the data.frame:

a = data.frame(array(c(1,2,3,1,4,5,6,2,7,8,9,4), c(4,3)))

Which will look like:

  X1 X2 X3
1  1  4  7
2  2  5  8
3  3  6  9
4  1  2  4

Lets suppose I wish to call upon a with two sets of criteria defined by arrays:

criteriaX1 = c(1,2,1,1,2)
criteriaX2 = c(4,5,4,2,5)

Then I would use this command:

a[ a$X1 %in% criteriaX1 & a$X2 %in% criteriaX2, ]

Hoping to get 5 rows like so (look @ criteriaX1 for the key, and read down X1. Should make sense if it didn't already):

  X1 X2 X3
1  1  4  7
2  2  5  8
3  1  4  7
4  1  2  4
5  2  5  8

But instead I just got this:

  X1 X2 X3
1  1  5  9

I'm guessing it has something to do with %in% defining Set Membership, but I'm not sure how to get around this without an obnoxious loop. All assistance is appreciated.

Thanks.


Solution

  • You could use a data.table equi-join:

    library(data.table)
    a <- data.table(a)
    b <- data.table(X1 = criteriaX1, X2 = criteriaX2)
    
    setkey(a, X1, X2)
    a[b]
    #    X1 X2 X3
    # 1:  1  4  7
    # 2:  2  5  8
    # 3:  1  4  7
    # 4:  1  4  7
    # 5:  2  5  8