Say, I have a dataset called iris
. I want to create an indicator variable called sepal_length_group
in this dataset. The values of this indicator will be p25, p50, p75, and p100. For example, I want sepal_length_group to be equal to "p25" for an observation if the Species is "setosa" and if the Sepal.Length
is equal to or less than the 25th percentile for all species classified as "setosa". I wrote the following codes, but it generates all NAs:
library(skimr)
sepal_length_distribution <- iris %>% group_by(Species) %>% skim(Sepal.Length) %>% select(3, 9:12)
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),2], "p25", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),2] &
Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),3], "p50", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),3] &
Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),4], "p75", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),4] &
Sepal.Length < sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),5], "p100", NA))
Any help will be highly appreciated!
This could be done simply by the use of the function cut
as commented by @Camille
library(tidyverse)
iris %>%
group_by(Species) %>%
mutate(cat = cut(Sepal.Length,
quantile(Sepal.Length, c(0,.25,.5,.75, 1)),
paste0('p', c(25,50, 75, 100)), include.lowest = TRUE))