df columns are as below:
Key1 Key2 Val1
Output needed I want groupby output where Val1 column has values clubbed together with groupby in python on keys Key1 and Key2.
Output expected: df2 [groupby on key1 and key2]
Key1 Key2 Val1 Count Sum Q1 Q2 Q3
Create a function to calculate Q1, Q2 and Q3: 25th, 50th and 75th percentiles as below:
def percentile(n):
def percentile_(x):
return np.percentile(x, n)
percentile_.__name__ = 'percentile_%s' % n
return percentile_
Now you can use named aggregation as mentioned below to obtain count, sum and the 3 quartile columns as you want.
df2 = (df.groupby(["Key1", "Key2"])
.agg(Count=(" Val1", 'count'),
Sum=('Val1','sum'),
Q1= (' Val1',percentile(25), #This is for first quartile
Q2= (' Val1',percentile(50), #2nd Quartile
Q3= (' Val1',percentile(75)) #This is for Quartile 3
Random eg.
df say was
Key1 Key2 Val1
a a 1
a a 2
a a 3
a a 4
b b 5
b b 6
b b 7
b b 8
df2 will have below columns as asked for groupby on keys Key1 and Key2.
Key1 Key2 Count Sum Q1 Q2 Q3
a a 4 10 1 2.5 3
b b 4 6.5 5 6.5 7