I have a dataset containing over 6,000 observations, each record having a score ranging from 0-100. Below is a sample:
+-----+-------+
| uID | score |
+-----+-------+
| 1 | 77 |
| 2 | 61 |
| 3 | 74 |
| 4 | 47 |
| 5 | 65 |
| 6 | 51 |
| 7 | 25 |
| 8 | 64 |
| 9 | 69 |
| 10 | 52 |
+-----+-------+
I want to bin them into equal deciles based upon their rank order relative to their peers within the score column with cutoffs being at every 10th percentile, as seen below:
+-----+-------+-----------+----------+
| uID | score | position% | scoreBin |
+-----+-------+-----------+----------+
| 7 | 25 | 0.1 | 1 |
| 4 | 47 | 0.2 | 2 |
| 6 | 51 | 0.3 | 3 |
| 10 | 52 | 0.4 | 4 |
| 2 | 61 | 0.5 | 5 |
| 8 | 64 | 0.6 | 6 |
| 5 | 65 | 0.7 | 7 |
| 9 | 69 | 0.8 | 8 |
| 3 | 74 | 0.9 | 9 |
| 1 | 77 | 1 | 10 |
+-----+-------+-----------+----------+
So far I've tried cut, cut2, tapply, etc. I think I'm on the right logic path, but I have no idea on how to apply them to my situation. Any help is greatly appreciated.
I would use ntile()
in dplyr
.
library(dplyr)
score<-c(77,61,74,47,65,51,25,64,69,52)
ntile(score, 10)
##[1] 10 5 9 2 7 3 1 6 8 4
scoreBin<- ntile(score, 10)