Search code examples
rdataframedplyrcharacterdata-wrangling

How do I select columns of a dataframe with character values based on the number of times a character appears in a column?


I have a dataframe that contains three values: 0, 1, and ?. The 0 and 1 values are character values and not numeric. I want to subset the dataframe so as to include only the columns in which the value "0" occurs at least twice and the value "1" occurs at least twice. So in the example data frame below, only columns 4 and 5 would be selected. How do I do this in R?

   x1 x2 x3 x4 x5 x6 x7
1  0  0  1  1  1  1  1
2  0  ?  1  0  1  0  1
3  0  0  1  0  1  0  1
4  0  ?  1  1  0  0  1
5  0  0  1  ?  1  0  0

Solution

  • With select + where:

    library(dplyr)
    dat %>% 
      select(where(~ sum(.x == "0") >= 2 & sum(.x == "1") >= 2))
    

    A base R alternative:

    dat[colSums(dat == "1") >= 2 & colSums(dat == "0") >= 2]