pythonpandasstatisticsquantilepercentile

# Find percentile stats of a given column

I have a `pandas` data frame `my_df`, where I can find the mean(), median(), mode() of a given column:

``````my_df['field_A'].mean()
my_df['field_A'].median()
my_df['field_A'].mode()
``````

I am wondering is it possible to find more detailed statistics such as the 90th percentile?

Solution

• You can use the `pandas.DataFrame.quantile()` function.
• If you look at the API for `quantile()`, you will see it takes an argument for how to do interpolation. If you want a quantile that falls between two positions in your data:
• 'linear', 'lower', 'higher', 'midpoint', or 'nearest'.
• By default, it performs linear interpolation.
• These interpolation methods are discussed in the Wikipedia article for percentile
``````import pandas as pd
import numpy as np

# sample data
np.random.seed(2023)  # for reproducibility
data = {'Category': np.random.choice(['hot', 'cold'], size=(10,)),
'field_A': np.random.randint(0, 100, size=(10,)),
'field_B': np.random.randint(0, 100, size=(10,))}
df = pd.DataFrame(data)

df.field_A.mean()  # Same as df['field_A'].mean()
# 51.1

df.field_A.median()
# 50.0

# You can call `quantile(i)` to get the i'th quantile,
# where `i` should be a fractional number.

df.field_A.quantile(0.1)  # 10th percentile
# 15.6

df.field_A.quantile(0.5)  # same as median
# 50.0

df.field_A.quantile(0.9)  # 90th percentile
# 88.8

df.groupby('Category').field_A.quantile(0.1)
#Category
#cold    28.8
#hot      8.6
#Name: field_A, dtype: float64
``````

### `df`

``````  Category  field_A  field_B
0     cold       96       58
1     cold       22       28
2      hot       17       81
3     cold       53       71
4     cold       47       63
5      hot       77       48
6     cold       39       32
7      hot       69       29
8      hot       88       49
9      hot        3       49
``````