Search code examples
pythonpandasstatisticsdistribution

Calculate percentile of value in column


I have a dataframe with a column that has numerical values. This column is not well-approximated by a normal distribution. Given another numerical value, not in this column, how can I calculate its percentile in the column? That is, if the value is greater than 80% of the values in the column but less than the other 20%, it would be in the 20th percentile.


Solution

  • Sort the column, and see if the value is in the first 20% or whatever percentile.

    for example:

    def in_percentile(my_series, val, perc=0.2): 
        myList=sorted(my_series.values.tolist())
        l=len(myList)
        return val>myList[int(l*perc)]
    

    Or, if you want the actual percentile simply use searchsorted:

    my_series.values.searchsorted(val)/len(my_series)*100