Search code examples
rclassificationcategorical-dataquantile

How to cut a vector into groups with equal number of observations in R?


How to cut a vector into groups containing approximately equal number of observations in R? I also need to know what are the cutting point values, to classify future input.

So basically, I am trying to convert continuous variable into a categorical one with equal number of observations in each category. And I need to know the borders of each category. Please help.

For example:

bla <- c(1,2,3,4,5,6,7,8,9,10,11,12)
blaClass <- cut(bla, 3)

Each blaClass contains equal number of observations. But problem is that I have many observations very close to each other or even of the same value, so it's hard to divide them into groups with equal observations.

I tried using quantileCut but it gives me "breaks are not unique" error.


Solution

  • You may use dplyr::ntile() to cut them into quantiles. For example,

    ntile(bla,3)
     [1] 1 1 1 1 2 2 2 2 3 3 3 3
    

    will cut them by q(1/3) and q(2/3) equally