Search code examples
python-3.xdataframeunique

Getting nunique percentage instead of count from a column


I have a dataframe with a column with repeating values and another column with categories against them.

Sample dataframe:

df  = pd.DataFrame({'col1': [101,102,103,101,104,106,102,103,104,105,106],
                    'col2': ['A','B','A','A','A','B','B','A','A','B','B'],
                    'col3': [123,234,345,456,567,678,789,890,912,123,234],
                   })

What I need is to get the a table with unique values from col2 as index or as a column, and in second column the count of unique values under that category in col1.

Expected Outcome for above sample:

A    0.5
B    0.5

I can get the unique values count under col2 using this. However, I need this in percentage format.

df.groupby(['col2'])['col1'].nunique()

Can I do that by adding a function to .apply() in this line of code itself or through some other method or function?

I do not want to create a new variable for this nunique() series/dataframe and then create a new column for percentage using operations through another line of code. Any way to do this in the above line of code itself?


Solution

  • I could find one way to do this with a single statement:

    (df.groupby(['col2'])['col1'].nunique() / df.groupby(['col2'])['col1'].nunique().sum())
    

    This will return the expected output of unique value percentages instead of number (count) as under:

    A    0.5
    B    0.5