Search code examples
rdataframesubsetmultiple-columns

How to subset a large dataframe using many conditions for many variables in a simple way?


I have this dataframe (but let's imagine it with many columns/variables)

df = data.frame(x = c(0,0,0,1,0),
                y = c(1,1,1,0,0),
                z = c(1,1,0,0,1))

I want to subset this dataset based on the condition that (x=1) and (y=0 or z = 0 or etc..)

I am already familiar with the basic function that works for small datasets, but I want a function that works for bigger datasets. Thanks


Solution

  • You can make use of Reduce(). The function + basically works as an OR operator since its result is >0 if it contains any TRUE value.

    Correspondingly, * would work as an AND since it only returns a value >0 if all cases are TRUE.

    df = data.frame(x = c(0,0,0,1,0),
                    y = c(1,1,1,0,0),
                    z = c(1,1,0,0,1))
    nms <- names(df)
    
    # take all variables except for `x`
    nms_rel <- setdiff(nms, "x")
    nms_rel
    #> [1] "y" "z"
    
    # filter all rows in which `x` is 1 AND any other variable is 0
    df[df$x == 1 & Reduce(`+`, lapply(df[nms_rel], `==`, 0)) > 0, ]
    #>   x y z
    #> 4 1 0 0