Search code examples
pythonpandasstringdataframeapply

How to find the number of unique values in comma separated strings stored in an pandas data frame column?


x Unique_in_x
5,5,6,7,8,6,8 4
5,9,8,0 4
5,9,8,0 4
3,2 2
5,5,6,7,8,6,8 4

Unique_in_x is my expected column.Sometime x column might be string also.


Solution

  • You can use a list comprehension with a set

    df['Unique_in_x'] = [len(set(x.split(','))) for x in df['x']]
    

    Or using a split and nunique:

    df['Unique_in_x'] = df['x'].str.split(',', expand=True).nunique(1)
    

    Output:

                   x  Unique_in_x
    0  5,5,6,7,8,6,8            4
    1        5,9,8,0            4
    2        5,9,8,0            4
    3            3,2            2
    4  5,5,6,7,8,6,8            4