Search code examples
pythonpandaspandas-groupbytail

how to calculate sum|mean|median for tail of each group when pandas data aggregated in python


i am having data like following.which is in pandas data frame format.

A  B  C  D  E  F  G
1  1  2  3  1  4  2
1  1  2  4  5  6  7
1  1  2  3  2  3  2
1  1  2  4  5  6  7
2  1  2  3  2  3  4
2  1  2  3  4  3  3
2  1  2  4  5  6  7

here agg_lvl=['A','B','C']

I want to calculate mean|median|sum for G variable by using tail(2) records in each group when data aggregated to agg_lvl.

And my expected output is like this:

expected output for mean:

A  B  C  G
1  1  2  4.5
2  1  2   5

the output will be same for median and sum also,but in place of mean we have to consider median and sum values.

for that i tried the following code but i didn't get the expected output.

df.groupby(agg_lvl,as_index=False).tail(2).agg({'G':'mean'})

can anyone help me tackle this issue.

Thanks in advance.


Solution

  • Use GroupBy.transform instead agg for return new column with same shape as filtered DataFrame by tail:

    agg_lvl=['A','B','C']
    df = df.groupby(agg_lvl,as_index=False).tail(2)
    df['G'] = df.groupby(agg_lvl)['G'].transform('mean')
    print (df)
       A  B  C  D  E  F    G
    2  1  1  2  3  2  3  4.5
    3  1  1  2  4  5  6  4.5
    5  2  1  2  3  4  3  5.0
    6  2  1  2  4  5  6  5.0
    

    EDIT:

    df = df.groupby(agg_lvl,as_index=False).tail(2).groupby(agg_lvl,as_index=False)['G'].mean()
    print (df)
       A  B  C    G
    0  1  1  2  4.5
    1  2  1  2  5.0