Search code examples
rlabelcut

Is there a way to write these multiple break points (with equal step length) in R function cut more efficiently?


This is what I´ve done and it gives the result I want, but in an very inefficient way.

cut(df1$wage, breaks = c(-Inf, 20000,21000,22000,23000,24000,25000,26000,27000,28000,29000,30000, Inf), 
         include.lowest=TRUE, dig.lab=10, labels = c("-20 000", "20 000-21 000", "21 000-22 000", "22 000-23 000", "23 000-24 000",
                                                    "24 000-25 000", "25 000-26 000", "26 000-27 000", "27 000-28 000", "28 000-29 000", "29 000-30 000", "30 000-"))

I want a lowest bin that include all values up to some specified value, in the example 20 000. And same with all values above 30 000.

And I would like to be able to vary the step length between the break points that in the example now is 1000, to say 500, without having to explicitly specify all the break points.

Optimally I would also like the labels to follow the break points i specify, which otherwise also becomes a very inefficient process

For the breaks-part I came close with breaks = (seq(from = 20000, to = 30000, by = 1000))but couldn't figure out how to also include the bottom and top bins as in the example above


Solution

  • You can store the breaks in a vector and use it in breaks and labels

    breaks <- seq(from = 20000, to = 30000, by = 1000)
    
    cut(df1$wage, breaks = c(-Inf, breaks Inf), include.lowest=TRUE, dig.lab=10, 
     labels = c(-20000, paste(head(breaks, -1), tail(breaks, -1), sep = "-"), "30000-"))