Search code examples
rquartile

Create variable in data frame using another column's Quartile value


I want to create a variable in a data frame, that would categorize the observations based on a column's Quartile/Median value.

Below is what I tried.

Name<-c("name1","name2","name3","name4","name5","name6")
Age<-c(49,12,29,55,25,19)

df9<-data.frame(Name,Age)

df9$catoG[df9$Age<=quantile(df9$Age,0.25)]<-"Young"
df9$catoG[df9$Age>quantile(df9$Age,0.25) & df9$Age<=median(df9$Age)]<-"Adult"
df9$catoG[df9$Age>median(df9$Age)]<-"Elder"

The output I received is

   Name Age catoG
1 name1  49 Elder
2 name2  12 Young
3 name3  29 Elder
4 name4  55 Elder
5 name5  25 Adult
6 name6  19 Young

Is there a more efficient way in R that I can achieve the same?


Solution

  • cut is your friend for all tasks involving splitting vectors in ranges:

    df9$new = cut(df9$Age, 
                  breaks = c(-Inf, quantile(df9$Age,c(0.25, 0.5)), Inf), 
                  labels = c('Young', 'Adult', 'Elder') )
    
    #   Name Age catoG   new
    #1 name1  49 Elder Elder
    #2 name2  12 Young Young
    #3 name3  29 Elder Elder
    #4 name4  55 Elder Elder
    #5 name5  25 Adult Adult
    #6 name6  19 Young Young