Search code examples
pythonpandasgroup-byquartile

Quartiles and other parameters in groupby on multiple keys in pandas python


df columns are as below:

  Key1   Key2   Val1  

Output needed I want groupby output where Val1 column has values clubbed together with groupby in python on keys Key1 and Key2.

Output expected: df2 [groupby on key1 and key2]

Key1 Key2 Val1  Count Sum Q1  Q2  Q3

Solution

  • Create a function to calculate Q1, Q2 and Q3: 25th, 50th and 75th percentiles as below:

    def percentile(n):
        def percentile_(x):
            return np.percentile(x, n)
        percentile_.__name__ = 'percentile_%s' % n
        return percentile_
    

    Now you can use named aggregation as mentioned below to obtain count, sum and the 3 quartile columns as you want.

          df2 = (df.groupby(["Key1", "Key2"])
                 .agg(Count=(" Val1", 'count'),    
                      Sum=('Val1','sum'),        
                      Q1= (' Val1',percentile(25), #This is for first quartile
                      Q2= (' Val1',percentile(50), #2nd Quartile
                      Q3= (' Val1',percentile(75)) #This is for Quartile 3
    

    Random eg.

    df say was

    Key1   Key2  Val1
    a       a     1
    a       a     2
    a       a     3
    a       a     4
    b       b     5
    b       b     6
    b       b     7
    b       b     8
    

    df2 will have below columns as asked for groupby on keys Key1 and Key2.

    Key1 Key2  Count  Sum   Q1  Q2  Q3
     a    a     4     10    1   2.5 3
     b    b     4     6.5   5   6.5 7