I have a dataframe with the following structure:
Cluster 1 Cluster 2 Cluster 3
ID Name Revenue ID Name Revenue ID Name Revenue
1234 John 123 1235 Jane 761 1237 Mary 276
1376 Peter 254 1297 Paul 439 1425 David 532
However I am unsure how to perform basic functions like .unique
or .value_count
for columns as I am unsure how to refer to them in the code...
For example, if I want to see the unique values in the Cluster 2 Name column, how would I code that?
Usually I would type df.Name.unique()
or df['Name'].unique()
but neither of these work.
My original data looked like this:
ID Name Revenue Cluster
1234 John 123 1
1235 Jane 761 2
1237 Mary 276 3
1297 Paul 439 2
1376 Peter 254 1
1425 David 532 3
And I used this code to get me to my current point:
df = (df.set_index([df.groupby('Cluster').cumcount(), 'Cluster'])
.unstack()
.swaplevel(1,0, axis=1)
.sort_index(axis=1)
.rename(columns=lambda x: f'Cluster {x}', level=0))```
You just need to subset by the index in sequence.
So your first step would be to subset Cluster 2, then get unique names.
For example:
df["Cluster 2"]["Names"].unique()