I have the following data:
df =
id interval name
1 8 A
1 8 B
1 8 C
2 2 B
2 2 C
2 3 A
I want to have a name per id and interval. In particular, I want A over B and C and B over C. Such as:
df =
id interval name
1 8 A
2 2 B
2 3 A
Use ordered Categorical
with DataFrameGroupBy.idxmin
for indexes by minimal values:
df['name'] = pd.Categorical(df['name'], categories=['A','B','C'], ordered=True)
df = df.loc[df.groupby(['id','interval'])['name'].idxmin()]
Or:
df['name'] = pd.Categorical(df['name'], categories=['A','B','C'], ordered=True)
df = df.sort_values('name').drop_duplicates(['id','interval']).sort_index(ignore_index=True)
print (df)
id interval name
0 1 8 A
1 2 2 B
2 2 3 A
Another idea with mapping with dictionary generated by enumerate
:
d = {v: k for k, v in enumerate(['A','B','C'])}
df = df.loc[df['name'].map(d).groupby([df['id'],df['interval']]).idxmin()]
print (df)
id interval name
0 1 8 A
3 2 2 B
5 2 3 A