Search code examples
pythonpandasrows

Get the first row of each group of unique values in another column


I want to extract from a dataframe only the first row of each value other than one of the columns (pandas) for example:

df
   col_A col_B
0      1     x
1      2    xx
2      3    xx
3      4     y
4      5     y

to

df1
  col_A col_B
0      1     x
1      2    xx
2      4     y

Solution

  • Use groupby + first:

    firsts = df.groupby('col_B', as_index=False).first()
    

    Output:

    >>> firsts
      col_B  col_A
    0     x      1
    1    xx      2
    2     y      4
    

    If the order of the columns is important:

    firsts = df.loc[df.groupby('col_B', as_index=False).first().index]
    

    Output:

    >>> firsts
       col_A col_B
    0      1     x
    1      2    xx
    2      3    xx