Search code examples
rtidyversesummary

How would I automate dropping a column in R based on summary data for that column?


I have a dataset that is being used to create an automated dashboard. Essentially it's looking at the relationship between certain conditions and the cost of care on a month by month basis for a health care institution. What I want to be able to do is in pseudocode:

dataset %>% select(-c("columns where the average value is lower than X"))

No amount of googling seems to be getting me close.


Solution

  • We can use select_if

    library(dplyr)
    val <- 10
    dataset %>%
        select_if(~ is.numeric(.) && mean(.) < val)
    

    Or using base R

    dataset[, names(which(colMeans(dataset[sapply(dataset, class) == 
                "numeric"]) < val)), drop = FALSE]
    #   col3
    #1    3
    #2    4
    #3    7
    

    data

    dataset <- data.frame(col1 = c('A', 'B', 'C'), col2 = c(10, 8, 15),
         col3 = c(3, 4, 7), stringsAsFactors = FALSE)