Search code examples
pythonpandasstatisticsquantilepercentile

Find percentile stats of a given column


I have a pandas data frame my_df, where I can find the mean(), median(), mode() of a given column:

my_df['field_A'].mean()
my_df['field_A'].median()
my_df['field_A'].mode()

I am wondering is it possible to find more detailed statistics such as the 90th percentile?


Solution

    • You can use the pandas.DataFrame.quantile() function.
      • If you look at the API for quantile(), you will see it takes an argument for how to do interpolation. If you want a quantile that falls between two positions in your data:
        • 'linear', 'lower', 'higher', 'midpoint', or 'nearest'.
        • By default, it performs linear interpolation.
        • These interpolation methods are discussed in the Wikipedia article for percentile
    import pandas as pd
    import numpy as np
    
    # sample data 
    np.random.seed(2023)  # for reproducibility
    data = {'Category': np.random.choice(['hot', 'cold'], size=(10,)),
            'field_A': np.random.randint(0, 100, size=(10,)),
            'field_B': np.random.randint(0, 100, size=(10,))}
    df = pd.DataFrame(data)
    
    df.field_A.mean()  # Same as df['field_A'].mean()
    # 51.1
    
    df.field_A.median() 
    # 50.0
    
    # You can call `quantile(i)` to get the i'th quantile,
    # where `i` should be a fractional number.
    
    df.field_A.quantile(0.1)  # 10th percentile
    # 15.6
    
    df.field_A.quantile(0.5)  # same as median
    # 50.0
    
    df.field_A.quantile(0.9)  # 90th percentile
    # 88.8
    
    df.groupby('Category').field_A.quantile(0.1)
    #Category
    #cold    28.8
    #hot      8.6
    #Name: field_A, dtype: float64
    

    df

      Category  field_A  field_B
    0     cold       96       58
    1     cold       22       28
    2      hot       17       81
    3     cold       53       71
    4     cold       47       63
    5      hot       77       48
    6     cold       39       32
    7      hot       69       29
    8      hot       88       49
    9      hot        3       49