I want to extract from a dataframe only the first row of each value other than one of the columns (pandas) for example:
df
col_A col_B
0 1 x
1 2 xx
2 3 xx
3 4 y
4 5 y
to
df1
col_A col_B
0 1 x
1 2 xx
2 4 y
firsts = df.groupby('col_B', as_index=False).first()
Output:
>>> firsts
col_B col_A
0 x 1
1 xx 2
2 y 4
If the order of the columns is important:
firsts = df.loc[df.groupby('col_B', as_index=False).first().index]
Output:
>>> firsts
col_A col_B
0 1 x
1 2 xx
2 3 xx