Search code examples
rdataframeselectlevels

Select variables/columns in a dataframe by those with matching given factor levels


Possible to select variables in a dataframe by those matching certain factor levels, selecting columns based on their factor levels (used or unused)? I can summarise by levels or subset possibly by rows, but I wondered if columns could be selected from the dataframe, or at least list variables/columns, that have certain factor levels.

library(dplyr)
height <- c(132,151,162,139,166,147,122)
    weight <- c(48,49,66,53,67,52,40)
    gender <- c("male","male","female","female","male","female","male")
    gender2 <- c("female","male","male","male","male","female","male")
    genderx <- c("xfemale","malex","malex","male","male","xfemale","xfemale")


    df <- data.frame(height,weight,gender, gender2, genderx) %>% 
      rowid_to_column(., "ID")

something like (or not like)

%>% select (vars(levels ==(c("male", "female")))

Solution

  • We can use select_if

    library(dplyr)
    df %>% 
        select_if(~ is.factor(.) && all(c("male", "female") %in% levels(.)))
    

    Or it can be any as well

    df %>% 
        select_if(~ is.factor(.) && any(c("male", "female") %in% levels(.)))