Search code examples
pythonpandasjupyter-notebookmulti-index

How to view or amend values in a multi index dataframe in python


I have a dataframe with the following structure:

      Cluster 1                Cluster 2                Cluster 3
ID     Name     Revenue    ID    Name   Revenue    ID     Name   Revenue
1234   John     123        1235  Jane   761        1237   Mary   276
1376   Peter    254        1297  Paul   439        1425   David  532

However I am unsure how to perform basic functions like .unique or .value_count for columns as I am unsure how to refer to them in the code...

For example, if I want to see the unique values in the Cluster 2 Name column, how would I code that? Usually I would type df.Name.unique() or df['Name'].unique() but neither of these work.

My original data looked like this:

ID     Name     Revenue     Cluster
1234   John     123         1
1235   Jane     761         2
1237   Mary     276         3
1297   Paul     439         2
1376   Peter    254         1
1425   David    532         3

And I used this code to get me to my current point:

df = (df.set_index([df.groupby('Cluster').cumcount(), 'Cluster'])
        .unstack()
        .swaplevel(1,0, axis=1)
        .sort_index(axis=1)
        .rename(columns=lambda x: f'Cluster {x}', level=0))```

Solution

  • You just need to subset by the index in sequence.

    So your first step would be to subset Cluster 2, then get unique names.

    For example:

    df["Cluster 2"]["Names"].unique()