I have a continuous varaible in my dataset with such distribution:
summary(emissions$NMVOC_gram)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 256 547 15802 1074 50818630
how can i categorize this variable to unequal levels of extremly high to extremly low, low, high and medium in R or excel? i add what i should have as picture,
thank you for the help enter image description here
I tried cut function in r but the result was not what i expected,actuallly i do not know how should i define the breaks, in my data the 3rd Qu. is lower than the Mean.
Presuming you want to cut data into quintiles (5 categories). Have included only count data than percentages.
library(tidyverse)
xs=quantile(iris$Sepal.Length,c(0,1/5,2/5,3/5,4/5,1))
xs2<-as.data.frame(xs)
iris <- iris %>%
mutate(Sepal_legth_cat = cut(Sepal.Length, breaks=xs, labels=c(paste0("ext low"),
paste0("low"),
paste0("med"),
paste0("high"),
paste0("ext high"))))
ggplot(iris,aes(Sepal_legth_cat))+
geom_bar()+
coord_flip()