Search code examples
rdata.tabledelete-row

R data.table - remove rows corresponding to a given marginal


I have the following problem. I have a data.table and a subset of columns M. I have vector x defined on M.

library(data.table)
data <- matrix(c(0,0,NA,1,0,1,NA,1,0,0,1,0,1,1,NA,NA,1,0,0,1,0,0,1,1,1,0,0,1,NA,0,1,1,0,1,1,1), byrow = T, ncol = 6, dimnames = LETTERS[1:6])
dt <- data.table(data)
dt
%     A B  C  D  E F
% 1:  0 0 NA  1  0 1
% 2: NA 1  0  0  1 0
% 3:  1 1 NA NA  1 0
% 4:  0 1  0  0  1 1
% 5:  1 0  0  1 NA 0
% 6:  1 1  0  1  1 1

M = LETTERS[2:5]
x <- dt[2,..M]
x
%    B C D E
% 1: 1 0 0 1

I would like to remove all rows from dt with marginal on M equal to x. I.e. rows no. 2 and 4. Both M and x change during the program. The result for the given M and x will be:


   A B  C  D  E F
1: 0 0 NA  1  0 1
2: 1 1 NA NA  1 0
3: 1 0  0  1 NA 0
4: 1 1  0  1  1 1

Solution

  • data.table anti-join

    dt[!x, on = M] # also works: dt[!dt[2], on = M]
    
    #    A B  C  D  E F
    # 1: 0 0 NA  1  0 1
    # 2: 1 1 NA NA  1 0
    # 3: 1 0  0  1 NA 0
    # 4: 1 1  0  1  1 1
    

    Base R

    eq2 <- Reduce('&', lapply(dt[, ..M], function(x) x == x[2]))
    
    dt[-which(eq2),]
    #    A B  C  D  E F
    # 1: 0 0 NA  1  0 1
    # 2: 1 1 NA NA  1 0
    # 3: 1 0  0  1 NA 0
    # 4: 1 1  0  1  1 1