I have a dataset that is being used to create an automated dashboard. Essentially it's looking at the relationship between certain conditions and the cost of care on a month by month basis for a health care institution. What I want to be able to do is in pseudocode:
dataset %>% select(-c("columns where the average value is lower than X"))
No amount of googling seems to be getting me close.
We can use select_if
library(dplyr)
val <- 10
dataset %>%
select_if(~ is.numeric(.) && mean(.) < val)
Or using base R
dataset[, names(which(colMeans(dataset[sapply(dataset, class) ==
"numeric"]) < val)), drop = FALSE]
# col3
#1 3
#2 4
#3 7
dataset <- data.frame(col1 = c('A', 'B', 'C'), col2 = c(10, 8, 15),
col3 = c(3, 4, 7), stringsAsFactors = FALSE)