rquantilepercentile

# How to break my sample into uneven categories?

We usually use `quartiles`,`quantiles`, or `ntiles` to split a sample. We can also use the function `cut`.

I have a numeric variable where i would like to split my sample into three categories. But these should not be evenly spaced. For example, the `quartile` function would split it to four evenly spaced quartiles. These are 0 to 25, 26 to 50, 51 to 75, and 76 to 100 percentiles. Therefore, the first three functions i mentioned cannot do the job. We can probably split the variable using `cut`, but I don't know how to do it in terms of percentile. I would like to create a variable that split the sample from the bottom 0 to the 20th percentile, then from 21 to 60, then from 61 to 100.

Here is a reproducible code:

``````    library(dplyr)
set.seed(1)
df <- tibble(
V1 = round(runif(1000,min=1, max=1000)),
V2 = round(runif(1000, min=1, max=3)),
V3 = round(runif(1000, min=1, max=10)))

df\$V2 = as.factor(df\$V2)
df\$V3 = as.factor(df\$V3)

df=df %>% group_by(V2,V3) %>%
mutate(quartile = ntile(V1,4))
``````

Solution

• I'm not 100% sure if this is what you're looking for, and I'll admit it's not the most elegant code ever written, but would something like:

``````cut.20 <- 20/100*length(df\$V1)
cut.60 <- 60/100*length(df\$V1)
#define your percentile limits (this is just based on googling how to calculate percentiles)

df <- arrange(df, V1) %>%
mutate("index" = c(1:nrow(df))) %>%
group_by(V2, V3) %>%
mutate("centile" = case_when(index > 0 & index <= cut.20 ~ "0-20",
index > cut.20 & index <= cut.60 ~ "21-60",
index > cut.60 ~ "60-100"))
``````

do what you're looking for?