Search code examples
rdplyrfiltermultiple-columns

Using filter to select specific values of ,multiple columns in R


Trying to use filter function in R and selecting only values needed in multiple columns and also including missing values.

x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)

df %>% 
  filter(x !=  1, 2, 3 | is.na(x))
  

I am trying to filter to values more than 4 in columns x, y and z keeping NA. Using the attempt above gives an error 'input must be logical vector, not a double'. Any suggestion to rectify the above error and also how to apply this command to all three columns.


Solution

  • First of all, please provide a reproducible example:

    x <- c(1:5, NA)
    y <- c(3:4, NA, 5:6, NA)
    z <- c(2:4, NA, 5:6)
    

    Then I would recommend using the package {data.table}.

    library(data.table)
    dt <- data.table(x, y, z)
    

    And then you can apply filters like so

    dt[x >= 4 | is.na(x), ]
    

    (meaning, give me all rows of the table where x is greater or equal to 4 or where x is NA.)

    You can further combine other logical constraints:

    dt[(x >= 4 | is.na(x)) | (y >= 4 | is.na(y)) | (z >= 4 | is.na(z)), ]
    

    Further information on the {data.table} syntax can be found here: https://rdatatable.gitlab.io/data.table/