Trying to use filter function in R and selecting only values needed in multiple columns and also including missing values.
x <- 1:2:3:4:5:NA
y <- 3:4:NA:5:6:NA
z <- 2:3:4:NA:5:6
df <- data.frame(x, y, z)
df %>%
filter(x != 1, 2, 3 | is.na(x))
I am trying to filter to values more than 4 in columns x, y and z keeping NA. Using the attempt above gives an error 'input must be logical vector, not a double'. Any suggestion to rectify the above error and also how to apply this command to all three columns.
First of all, please provide a reproducible example:
x <- c(1:5, NA)
y <- c(3:4, NA, 5:6, NA)
z <- c(2:4, NA, 5:6)
Then I would recommend using the package {data.table}
.
library(data.table)
dt <- data.table(x, y, z)
And then you can apply filters like so
dt[x >= 4 | is.na(x), ]
(meaning, give me all rows of the table where x is greater or equal to 4 or where x is NA.)
You can further combine other logical constraints:
dt[(x >= 4 | is.na(x)) | (y >= 4 | is.na(y)) | (z >= 4 | is.na(z)), ]
Further information on the {data.table}
syntax can be found here: https://rdatatable.gitlab.io/data.table/