Search code examples
rcategorical-data

Why aren't double digits numbers included in my "less than" and "greater than" commands?


I am fairly new to R, and I am using it for my thesis. I tried to create a set of commands that recode a range of numeric values as a categorical variable. The range of possible values in my dataset range from 1 - 13. For some reason, all of the values with double digit numbers do not get grouped into the factor level I created, and I don't know why.

Here is my code creating the categorical groups, converting it to factor levels, and the output:

> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions <=2] <- "≤2"
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=3 & Desc$Number.of.Chronic.conditions <5] <- "3 - 4"
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=5 & Desc$Number.of.Chronic.conditions <7] <- "5 - 6"
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=7 & Desc$Number.of.Chronic.conditions <9] <- "7 - 8"
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=9] <- "≥9"
> 
> Desc$Number.of.Chronic.conditions <- factor(Desc$Number.of.Chronic.conditions)

> print(Desc$Number.of.Chronic.conditions)
[1] 5 - 6 7 - 8 ≤2    5 - 6 5 - 6 3 - 4 7 - 8 ≤2    7 - 8 3 - 4 5 - 6 ≤2    5 - 6 ≥9    ≤2    5 - 6 5 - 6 7 - 8 10   
[20] 7 - 8 11    7 - 8 5 - 6 ≤2    3 - 4 5 - 6 ≥9    5 - 6 7 - 8 3 - 4 3 - 4 5 - 6 ≤2    ≤2    5 - 6 3 - 4 7 - 8 3 - 4
[39] ≤2    7 - 8 5 - 6 7 - 8 7 - 8 5 - 6 10    ≤2    ≤2    ≤2    ≤2    3 - 4 3 - 4 ≤2    ≤2    ≤2    7 - 8 ≤2    ≤2   
[58] 7 - 8 ≤2    3 - 4 3 - 4 ≤2    13    3 - 4 3 - 4 3 - 4 7 - 8 5 - 6 3 - 4 5 - 6 3 - 4 5 - 6 5 - 6 5 - 6 3 - 4 3 - 4
[77] 5 - 6 ≥9    ≤2    ≤2    10    3 - 4 7 - 8 11    7 - 8 5 - 6 3 - 4 3 - 4 ≥9    3 - 4 3 - 4 5 - 6 3 - 4 7 - 8 5 - 6
[96] 5 - 6 3 - 4 12    10    ≤2    5 - 6 5 - 6 3 - 4 3 - 4 3 - 4 5 - 6 5 - 6 3 - 4 5 - 6 ≤2    5 - 6 3 - 4 5 - 6 3 - 4
[115] 3 - 4 ≤2    5 - 6 7 - 8 3 - 4 ≤2    3 - 4 7 - 8 5 - 6 7 - 8 5 - 6 ≤2    7 - 8 ≤2    ≤2    ≥9    7 - 8 ≥9    3 - 4
[134] 5 - 6 ≤2    5 - 6 3 - 4 ≤2    3 - 4 3 - 4 ≤2    5 - 6 3 - 4 ≤2    7 - 8 3 - 4 ≤2    ≤2    3 - 4 3 - 4 ≤2    10   
[153] 3 - 4 5 - 6 5 - 6 5 - 6 5 - 6 3 - 4 5 - 6 5 - 6 5 - 6 7 - 8 5 - 6 5 - 6 5 - 6 10    5 - 6 3 - 4 3 - 4 ≤2    3 - 4
[172] ≤2    7 - 8 ≤2    ≤2    7 - 8 ≤2    7 - 8 10    5 - 6 ≥9    3 - 4 3 - 4
Levels: ≤2 ≥9 10 11 12 13 3 - 4 5 - 6 7 - 8

> summary(Desc$Number.of.Chronic.conditions)
   ≤2    ≥9    10    11    12    13 3 - 4 5 - 6 7 - 8 
   40     7     7     2     1     1    49    49    27 

Solution

  • Even if you start with a vector of integers, when you write a character string into a vector of integers the whole vector is converted to strings.

    See:

    Desc<-data.frame(Number.of.Chronic.conditions=c(1:15))
    
    str(Desc)  ## str shows the structure of an object
    'data.frame':   15 obs. of  1 variable:
     $ Number.of.Chronic.conditions: int  1 2 3 4 5 6 7 8 9 10 ... 
    
    Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions <=2] <- "≤2"
    
    str(Desc)
    'data.frame':   15 obs. of  1 variable:
     $ Number.of.Chronic.conditions: chr  "≤2" "≤2" "3" "4" ...
    

    The rules for >= of an integer are different from strings. Both "9" and 9 are >= 9; however "10" is not >= 9.

    There are at least two ways to solve this, perform all the binning in one step using a function like the dplyr library's mutate(case_when(...))), or push the binned factor into its own column:

    Desc<-data.frame(Number.of.Chronic.conditions=c(1:15))
    
    Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions <=2] <- "≤2"
    Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=3 & Desc$Number.of.Chronic.conditions <5] <- "3 - 4"
    Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=5 & Desc$Number.of.Chronic.conditions <7] <- "5 - 6"
    Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=7 & Desc$Number.of.Chronic.conditions <9] <- "7 - 8"
    Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=9] <- "≥9"
    
    Desc$Factor.of.Chronic.conditions <- factor(Desc$Factor.of.Chronic.conditions)
    
    table(Desc$Factor.of.Chronic.conditions)