Search code examples
pythonpandasunique

cumulative number of unique elements for pandas dataframe


i have a pandas data frame

id tag
1  A
1  A
1  B
1  C
1  A
2  B
2  C  
2  B 

I want to add a column which computes the cumulative number of unique tags over at id level. More specifically, I would like to have

id tag count
1  A   1
1  A   1
1  B   2
1  C   3
1  A   3
2  B   1
2  C   2
2  B   2

For a given id, count will be non-decreasing. Thanks for your help!


Solution

  • I think this does what you want:

    unique_count = df.drop_duplicates().groupby('id').cumcount() + 1
    unique_count.reindex(df.index).ffill()
    

    The +1 is because the count starts at zero. This only works if the dataframe is sorted by id. Was that intended? You can always sort beforehand.