Search code examples
rcut

R How To Set Up Cut Function


set.seed(1)
DATA = data.frame(X = sample(c(0:100), 1000, replace = TRUE))
DATA$CUT = with(DATA, cut(X, breaks = c(10,20,30,40,50,60,70,80,90), right = FALSE))

I wish to get groups: 0-9, 10-19, 20-29,..,80-89, 90+ but no matter how I do cut function I do not get these breaks.


Solution

  • You need to include the extreme bounds. For example

    breaks <- c(0,10,20,30,40,50,60,70,80,90, Inf)
    DATA <- transform(DATA, CUT=cut(X, breaks=breaks, right = FALSE))
    

    which results in

    table(DATA$CUT)
    #   [0,10)  [10,20)  [20,30)  [30,40)  [40,50)  [50,60)  [60,70)  [70,80)  [80,90) [90,Inf) 
    #     102       84       96      102       96      102       90       94       122      112 
    

    Since cut() usually expects continuous values and not counts, if you have integers, [0,10) is the same as [0,9] or 0-9

    If you want to set the labels, you can do

    breaks <- c(0,10,20,30,40,50,60,70,80,90, Inf)
    labels <- paste(head(breaks, -1), tail(breaks, -1)-1, sep="-")
    DATA <- transform(DATA, CUT=cut(X, breaks=breaks, labels=labels, right = FALSE))
    

    which now results in

    table(DATA$CUT)
    #    0-9  10-19  20-29  30-39  40-49  50-59  60-69  70-79  80-89 90-Inf 
    #    102     84     96    102     96    102     90     94    122    112