I have a dataframe that contains three values: 0, 1, and ?. The 0 and 1 values are character values and not numeric. I want to subset the dataframe so as to include only the columns in which the value "0" occurs at least twice and the value "1" occurs at least twice. So in the example data frame below, only columns 4 and 5 would be selected. How do I do this in R?
x1 x2 x3 x4 x5 x6 x7
1 0 0 1 1 1 1 1
2 0 ? 1 0 1 0 1
3 0 0 1 0 1 0 1
4 0 ? 1 1 0 0 1
5 0 0 1 ? 1 0 0
With select
+ where
:
library(dplyr)
dat %>%
select(where(~ sum(.x == "0") >= 2 & sum(.x == "1") >= 2))
A base R alternative:
dat[colSums(dat == "1") >= 2 & colSums(dat == "0") >= 2]