Search code examples
pythonrcategorical-databinning

How is age classified as a categorical variable?


O.K this question is very basic, but i can't get it so need your help. I understand the idea of splitting age to categories. For example : good graph (:

I don't understand how the model knows that the 30< category is before the 31-45 category, why the 31-45 category is before the 46-60 category and etc. how the model knows not to make this graph - bad graph ):

Thanks!


Solution

  • Consider this example:

    age = 1:100
    
    fctr <- as.factor(cut(age, breaks = c(0,25,50,75,100)))
    
    print(levels(fctr))
    
    [1] "(0,25]"   "(25,50]"  "(50,75]"  "(75,100]"
    

    There you can see, how the levels are ordered. This is the order that plot and ggplot2 will use. You can change this order in the following way:

    fctr2 <- factor(fctr,levels(fctr)[c(2,1,3,4)])
    
    print(levels(fctr2))
    
    [1] "(25,50]"  "(0,25]"   "(50,75]"  "(75,100]"
    

    If you are working more often with factors consider using the forcats package.