Search code examples
rpercentilebinning

r bin equal deciles


I have a dataset containing over 6,000 observations, each record having a score ranging from 0-100. Below is a sample:

+-----+-------+
| uID | score |
+-----+-------+
|   1 |    77 |
|   2 |    61 |
|   3 |    74 |
|   4 |    47 |
|   5 |    65 |
|   6 |    51 |
|   7 |    25 |
|   8 |    64 |
|   9 |    69 |
|  10 |    52 |
+-----+-------+

I want to bin them into equal deciles based upon their rank order relative to their peers within the score column with cutoffs being at every 10th percentile, as seen below:

+-----+-------+-----------+----------+
| uID | score | position% | scoreBin |
+-----+-------+-----------+----------+
|   7 |    25 | 0.1       |        1 |
|   4 |    47 | 0.2       |        2 |
|   6 |    51 | 0.3       |        3 |
|  10 |    52 | 0.4       |        4 |
|   2 |    61 | 0.5       |        5 |
|   8 |    64 | 0.6       |        6 |
|   5 |    65 | 0.7       |        7 |
|   9 |    69 | 0.8       |        8 |
|   3 |    74 | 0.9       |        9 |
|   1 |    77 | 1         |       10 |
+-----+-------+-----------+----------+

So far I've tried cut, cut2, tapply, etc. I think I'm on the right logic path, but I have no idea on how to apply them to my situation. Any help is greatly appreciated.


Solution

  • I would use ntile() in dplyr.

    library(dplyr)
    
    score<-c(77,61,74,47,65,51,25,64,69,52)
    ntile(score, 10)
    
    ##[1] 10  5  9  2  7  3  1  6  8  4
    
    scoreBin<- ntile(score, 10)