Search code examples
pythonpandasunique

Return order of unique values in a pandas column


I am trying to find a more efficient way to return the index of unique values in a pandas df

For the df below I want to return the index of the first time a unique value occurs.

import pandas as pd
import numpy as np

d = ({
    'Day' : ['Mon','Mon','Tues','Mon','Tues','Wed'],                                
     })

df = pd.DataFrame(data=d)

I can manually counti the index of unique value and return below:

first = df.iloc[0].Location
second = df.iloc[2].Location 
third = df.iloc[5].Location    

I was thinking of doing something like

first = (df['Day'] == 'Mon')

But I still have to change this to find the 2nd, 3rd unique value. Is there a more efficient method?


Solution

  • If want filter all unique index values use drop_duplicates with keep=False:

    print (df['Day'].drop_duplicates(keep=False))
    5    Wed
    Name: Day, dtype: object
    
    print (df['Day'].drop_duplicates(keep=False).index)
    Int64Index([5], dtype='int64')
    

    Or:

    print (df.index[~df['Day'].duplicated(keep=False)])
    Int64Index([5], dtype='int64')
    

    If want filter first unique values use only drop_duplicates:

    print (df['Day'].drop_duplicates())
    0     Mon
    2    Tues
    5     Wed
    Name: Day, dtype: object
    
    print (df['Day'].drop_duplicates().index)
    Int64Index([0, 2, 5], dtype='int64')