Search code examples
rcontinuouscategorical-data

Changing Continuous Ranges to Categorical in R


I was trying to convert some continuous integers to categorical ranges, but something I did not understand happened. Although I fixed to get what I want, I still don't understand why it happened.

The variable is some integers from 0 to 12, the following code left 10,11,12 out from the 5+ category.

py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==0]<-"0"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==1]<-"1"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==2]<-"2"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==3]<-"3"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==4]<-"4"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain>=5]<-"5+"
py2$Daily.Whole.Grain<-as.factor(py2$Daily.Whole.Grain)

But when I change the order of conversion, it includes 10,11,12.

py2$Daily.Whole.Grain[py2$Daily.Whole.Grain>=5]<-"5+"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==0]<-"0"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==1]<-"1"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==2]<-"2"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==3]<-"3"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==4]<-"4"

Can anyone explain it, why it leaves double digits integers out? Thanks very much.


Solution

  • As @CathG mentioned, the problem is due to converting the column from a numeric class to character. Here is perhaps a better solution using the cut function which will give you factors based on cut-points of a variable:

    py2 <- data.frame(Daily.Whole.Grain = 1:10)
    py2$Daily.Whole.Grain1 <- cut(py2$Daily.Whole.Grain, 
        breaks = c(1:5, Inf), right = FALSE, labels = c(1:4, "5+"))
    py2
       Daily.Whole.Grain Daily.Whole.Grain1
    1                  1                  1
    2                  2                  2
    3                  3                  3
    4                  4                  4
    5                  5                 5+
    6                  6                 5+
    7                  7                 5+
    8                  8                 5+
    9                  9                 5+
    10                10                 5+