Search code examples
rselectsubsetdistinct-values

Is it possible to select columns in r based on any value in the column?


I want to subset my df to include only columns that include a certain value in any row.

for example, if I have:

year = c(1990,1991,1992,1993,1994,1995,1996,1997,1998,1999)
apple = c(1,4,6,8,9,9,2,4,7,4)
orange = c(7,1,5,5,2,1,7,1,3,8)
banana = c(9,9,4,8,1,3,6,7,5,9)
lemon = c(8,3,3,3,2,5,6,7,2,4)
df = data.frame(year,apple,orange,banana,lemon)

df

I want to select only the columns that have a 9 anywhere in the column so that my df would become only include the apple and banana columns.

Is this possible? All the answers I've found so far only enable selecting columns based on the column name, but I want to select based on cell values within the column. Thank you!


Solution

  • We can pass a function in select within where - check whether the column is numeric and if that is numeric, check whether there are any value equal to 9. In addition can change the any(.x ==9) to 9 %in% .x.

    library(dplyr)
    df %>% 
      select(where(~is.numeric(.x) && any(.x == 9)))
    

    -output

     apple banana
    1      1      9
    2      4      9
    3      6      4
    4      8      8
    5      9      1
    6      9      3
    7      2      6
    8      4      7
    9      7      5
    10     4      9