Search code examples
pythonpandaspandas-groupbytransform

Aggregate only one of the duplicated values with groupby pandas


I have the following data with the last column as the desired output:

activity teacher group students the desired column
One A a 3 5
One B b 2 5
two A c 7 7
One D a 3 5
two C c 7 7

I want to group by the activity and returen the number of students without dublicate the student when we have more than one teacher for them. I tried the following but it's dublicate the sum with for the same group.

df.groupby('activity').students.transform('sum')

the output of this looks like:

activity teacher group students the output column
One A a 3 8
One B b 2 8
two A c 7 14
One A a 3 8
two C c 7 14

thank you in advance for any suggestion.


Solution

  • IIUC:

    x = (
        df.drop_duplicates(subset=["activity", "group"])
        .groupby("activity")["students"]
        .sum()
    )
    df["the desired column"] = df["activity"].map(x)
    print(df)
    

    Prints:

      activity teacher group  students  the desired column
    0      One       A     a         3                   5
    1      One       B     b         2                   5
    2      two       A     c         7                   7
    3      One       D     a         3                   5
    4      two       C     c         7                   7