Search code examples
rvectorizationsubsetlogical-operatorsgeneralization

Applying vectorized subsetting across multiple columns in R


I try to find a straight-forward way to vectorize/generalize the subsetting of a data.frame. Let's assume I have a data.frame:

df <- data.frame(A = 1:5, B = 10 * 1:5, C = 100 * 1:5)

Every column has its own condition and the goal is subset the df so that only those rows remain where the condition is met for at least one column. I now want to find a vectorized subset mechanism that generalizes

df <- subset(df, df[,1]<2 | df[,2]< 30 | df[,3]<100)

so I could formulate it somewhat like this

crit <- c(2,30,100)
df <- subset(df, df$header < crit[1:3])

and down the road I want to get to.

df <- subset(df, df$header < crit[1:n])

I know a multi-step loop workaround, but there must be another way. I am grateful for any help.


Solution

  • Given:

    x <- c(1:5)
    y <- c(10,20,30,40,50)
    z <- c(100,200,300,400,500)
    
    # df is a base function
    mydf <- data.frame(A = x, B = y, C = z)
    
    crit <- c(2,30,100)
    

    Then this will let you see which values in the column are less than the crit value:

    > sweep(mydf, 2, crit, "<")
             A     B     C
    [1,]  TRUE  TRUE FALSE
    [2,] FALSE  TRUE FALSE
    [3,] FALSE FALSE FALSE
    [4,] FALSE FALSE FALSE
    [5,] FALSE FALSE FALSE
    

    And this will give you the rows that meet any of the criteria:

    > subset(mydf, rowSums(sweep(mydf, 2, crit, "<")) > 0)
    
      A  B   C
    1 1 10 100
    2 2 20 200