Search code examples
rdata-sciencehistogramdistribution

Interval Query for a Percent Distribution of Numbers


Lets say we have the following series of values

10,10,10,10,10,10,14,14,14,22,22,28

According to the histogram we have the following number of values from the given series in the four bins, as:

9:[10,15)
0:[15,20)
2:[20,25)
1:[25,30)

As it is evident that 9/12(75%) of value lie in the interval [10,15); 11/12(91%) of values lie in interval [10,25). I am interested to come up with a function that takes a series and percentage and returns the range of interval in which those asked percentage lie.

For example: query(Series=c(10,10,10,10,10,10,14,14,14,22,22,28), Pct=91) should return c(10,25). I am somewhat new to R and if anyone can point me to either a builtin function for this task or provide me an implementation will be helpful. Thanks in advance


Solution

  • quantile(c(10,10,10,10,10,10,14,14,14,22,22,28),c(0,0.91))
    

    This doesn't quite produce your desired output, where you've either found the mid point between 22 and 28 or you've rounded it to an appropriate bucket size for plotting. This is doing a linear interpolation for the quantile between those two points, i.e. 22 is the 10/11th quantile (90.9090...%) and 28 is the 100%. 91% comes out at 22.06.