Search code examples
python-3.xpandasdataframepandas-groupby

Pandas groupby: coordinates of current group


Suppose I have a data frame

import pandas as pd
df = pd.DataFrame({'group':['A','A','B','B','C','C'],'score':[1,2,3,4,5,6]})

At first, say, I want to compute the groups' sums of scores. I usually do

def group_func(x):
    d = {}
    d['sum_scores'] = x['score'].sum()
    return pd.Series(d)
df.groupby('group').apply(group_func).reset_index()

Now suppose I want to modify group_func but this modification requires that I know the group identity of the current input x. I tried x['group'] and x[group].iloc[0] within the function's definition and neither worked.

Is there a way for the function group_func(x) to know the defining coordinates of the current input x?

In this toy example, say, I just want to get:

pd.DataFrame({'group':['A','B','C'],'sum_scores':[3,7,11],'name_of_group':['A','B','C']})

where obviously the last column just repeats the first one. I'd like to know how to make this last column using a function like group_func(x). Like: as group_func processes the x that corresponds to group 'A' and generates the value 3 for sum_scores, how do I extract the current identity 'A' within the local scope of group_func?


Solution

  • Just add .name

    def group_func(x):
            d = {}
            d['sum_scores'] = x['score'].sum()
            d['group_name'] = x.name # d['group_name'] = x['group'].iloc[0] 
            return pd.Series(d)
        
    df.groupby('group').apply(group_func)
    Out[63]: 
           sum_scores group_name
    group                       
    A               3          A
    B               7          B
    C              11          C
    

    Your code fix see about marked line adding ''