Quartiles and other parameters in groupby on multiple keys in pandas python

df columns are as below:

  Key1   Key2   Val1

Output needed I want groupby output where Val1 column has values clubbed together with groupby in python on keys Key1 and Key2.

Output expected: df2 [groupby on key1 and key2]

Key1 Key2 Val1  Count Sum Q1  Q2  Q3

Solution

Create a function to calculate Q1, Q2 and Q3: 25th, 50th and 75th percentiles as below:

def percentile(n):
    def percentile_(x):
        return np.percentile(x, n)
    percentile_.__name__ = 'percentile_%s' % n
    return percentile_

Now you can use named aggregation as mentioned below to obtain count, sum and the 3 quartile columns as you want.

      df2 = (df.groupby(["Key1", "Key2"])
             .agg(Count=(" Val1", 'count'),    
                  Sum=('Val1','sum'),        
                  Q1= (' Val1',percentile(25), #This is for first quartile
                  Q2= (' Val1',percentile(50), #2nd Quartile
                  Q3= (' Val1',percentile(75)) #This is for Quartile 3

Random eg.

df say was

Key1   Key2  Val1
a       a     1
a       a     2
a       a     3
a       a     4
b       b     5
b       b     6
b       b     7
b       b     8

df2 will have below columns as asked for groupby on keys Key1 and Key2.

Key1 Key2  Count  Sum   Q1  Q2  Q3
 a    a     4     10    1   2.5 3
 b    b     4     6.5   5   6.5 7