rquantilepercentile# How to break my sample into uneven categories?

We usually use `quartiles`

,`quantiles`

, or `ntiles`

to split a sample. We can also use the function `cut`

.

I have a numeric variable where i would like to split my sample into three categories. But these should not be evenly spaced. For example, the `quartile`

function would split it to four evenly spaced quartiles. These are 0 to 25, 26 to 50, 51 to 75, and 76 to 100 percentiles. Therefore, the first three functions i mentioned cannot do the job. We can probably split the variable using `cut`

, but I don't know how to do it in terms of percentile. **I would like to create a variable that split the sample from the bottom 0 to the 20th percentile, then from 21 to 60, then from 61 to 100.**

Here is a reproducible code:

```
library(dplyr)
set.seed(1)
df <- tibble(
V1 = round(runif(1000,min=1, max=1000)),
V2 = round(runif(1000, min=1, max=3)),
V3 = round(runif(1000, min=1, max=10)))
df$V2 = as.factor(df$V2)
df$V3 = as.factor(df$V3)
df=df %>% group_by(V2,V3) %>%
mutate(quartile = ntile(V1,4))
```

Solution

I'm not 100% sure if this is what you're looking for, and I'll admit it's not the most elegant code ever written, but would something like:

```
cut.20 <- 20/100*length(df$V1)
cut.60 <- 60/100*length(df$V1)
#define your percentile limits (this is just based on googling how to calculate percentiles)
df <- arrange(df, V1) %>%
mutate("index" = c(1:nrow(df))) %>%
group_by(V2, V3) %>%
mutate("centile" = case_when(index > 0 & index <= cut.20 ~ "0-20",
index > cut.20 & index <= cut.60 ~ "21-60",
index > cut.60 ~ "60-100"))
```

do what you're looking for?

- Installing R on Linux: configure: error: libcurl >= 7.28.0 library and headers are required with support for https
- How to do ensembles with time series using AICc?
- planes3d expands and draws the area based on the sphere's radius
- How to extract tag code itself using R, rvest
- How to Display or Print Contents of Environment in R
- How to use Windows user credentials for proxy authentication in R/RStudio
- R reticulate specifying python executable to use
- Replace multiple Instances of a variable name in an R function and save the modified function
- Standardizing address formatting in R
- How to fix "failed to load cairo DLL" in R?
- Using grepl to filter columns names in specific range of columns
- changing the legends in ggplot2 to have groups of similar labels
- How to keep only unique rows but ignore a column?
- convert string date to R Date FAST for all dates
- Add subgroup text to plotly pie chart
- R Shiny : adjust height of DT datatable when fillContainer=TRUE,
- Why do R external pointers' "unusual copying semantics" mean they should not be used stand-alone?
- How to extract somo character after a string with a number of word which can change in R
- What does `se` stand for in geom_smooth(..., se = FALSE)?
- How to find number of rows greater than any values in R
- Align text and reduce space between text and parentheses in plotly hover info box
- Remove outer box of geom_bar plot with broken y-axis
- How to use lag/lead in mutate with an initial value?
- Is it possible to have a Shiny ConditionalPanel whose condition is a global variable?
- counting elements in one list in another list
- How to vectorize nested loops in R?
- Replace NA values with an incrementing sequence starting from the previous non-NA value
- How can I calculate the number of uniques in a row within a species matrix?
- How to perform operations on pairs of rows, based on a "distinguishing" column's values
- Mutate variable based on previous observations