Search code examples
pandasgroup-by

Python: obtaining a value per group using a hierarchical structure in pandas


I have the following data:

df =
id interval name
1  8        A
1  8        B
1  8        C
2  2        B
2  2        C
2  3        A

I want to have a name per id and interval. In particular, I want A over B and C and B over C. Such as:

df =
id interval name
1  8        A
2  2        B
2  3        A

Solution

  • Use ordered Categorical with DataFrameGroupBy.idxmin for indexes by minimal values:

    df['name'] = pd.Categorical(df['name'], categories=['A','B','C'], ordered=True)
    
    df = df.loc[df.groupby(['id','interval'])['name'].idxmin()]
    

    Or:

    df['name'] = pd.Categorical(df['name'], categories=['A','B','C'], ordered=True)
    
    df = df.sort_values('name').drop_duplicates(['id','interval']).sort_index(ignore_index=True)
    print (df)
       id  interval name
    0   1         8    A
    1   2         2    B
    2   2         3    A
    

    Another idea with mapping with dictionary generated by enumerate:

    d = {v: k for k, v in enumerate(['A','B','C'])}
    
    df = df.loc[df['name'].map(d).groupby([df['id'],df['interval']]).idxmin()]
    print (df)
       id  interval name
    0   1         8    A
    3   2         2    B
    5   2         3    A