Search code examples
rvariablesregressioncategorical-datacut

How cut a continuous skewed variable to exteremly high to extremly low categories?


I have a continuous varaible in my dataset with such distribution:

summary(emissions$NMVOC_gram)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
       0      256      547    15802     1074 50818630 

how can i categorize this variable to unequal levels of extremly high to extremly low, low, high and medium in R or excel? i add what i should have as picture,

thank you for the help enter image description here

I tried cut function in r but the result was not what i expected,actuallly i do not know how should i define the breaks, in my data the 3rd Qu. is lower than the Mean.


Solution

  • Presuming you want to cut data into quintiles (5 categories). Have included only count data than percentages.

    library(tidyverse)
    xs=quantile(iris$Sepal.Length,c(0,1/5,2/5,3/5,4/5,1))
    xs2<-as.data.frame(xs)
    iris <- iris %>%
      mutate(Sepal_legth_cat = cut(Sepal.Length, breaks=xs, labels=c(paste0("ext low"),
                                                                          paste0("low"),
                                                                          paste0("med"),
                                                                          paste0("high"),
                                                                          paste0("ext high"))))
    
    ggplot(iris,aes(Sepal_legth_cat))+
      geom_bar()+
      coord_flip()