I have a dataframe with a column with repeating values and another column with categories against them.
Sample dataframe:
df = pd.DataFrame({'col1': [101,102,103,101,104,106,102,103,104,105,106],
'col2': ['A','B','A','A','A','B','B','A','A','B','B'],
'col3': [123,234,345,456,567,678,789,890,912,123,234],
})
What I need is to get the a table with unique values from col2 as index or as a column, and in second column the count of unique values under that category in col1.
Expected Outcome for above sample:
A 0.5
B 0.5
I can get the unique values count under col2 using this. However, I need this in percentage format.
df.groupby(['col2'])['col1'].nunique()
Can I do that by adding a function to .apply() in this line of code itself or through some other method or function?
I do not want to create a new variable for this nunique() series/dataframe and then create a new column for percentage using operations through another line of code. Any way to do this in the above line of code itself?
I could find one way to do this with a single statement:
(df.groupby(['col2'])['col1'].nunique() / df.groupby(['col2'])['col1'].nunique().sum())
This will return the expected output of unique value percentages instead of number (count) as under:
A 0.5
B 0.5