Search code examples
rcategoriespercentile

percentile categories in R


I have a dataset similar to the following and I want to categorise my values in high/medium/low based on percentiles. I use the following but I am confused about the 99% and the values above this value.

data(iris)
quantile(iris$Petal.Length, probs = 0.01)# all the values less than 1.149 are low
quantile(iris$Petal.Length, probs = 0.99)# here must be the high-values category

questions:

  1. there are values greater than the 99% percentile (6.7). where these values belong?
  2. what is the medium category?

Solution

    1. the values greater than those of the 99. percentile are in your top 1%. Following your argument, those would be the high values, i.e. > 6.7
    2. the medium category is all what is in your 99. percentile excluding what is in your 1. percentile, i.e. 1.149 < medium < 6.7

    To make this more clear, here is a graph that shows the 5. and the 95. percentile of body hieght in human. It was assigned to three categories as in your example.

    enter image description here