Search code examples
pythonpandasdataframemulti-index

Converting single index pandas df to multi-index df and then grouping all rows with the same index together on separate lines


I have a df that looks like this:

import pandas as pd

# Create df
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Carol', 'Alice', 'Carol', 'Alice', 'Carol', 'Matt'],
                   'Address': ['123 A St', '123 B St', '123 C St', '123 A St', '123 C St', '456 X St', '123 C St', '123 M St'],
                   'State': ['AZ', 'TX', 'CA', 'AZ', 'CA', 'AZ', 'CA', 'MA'],
                   'Car': ['GMC', 'Mazda', 'Tesla', 'Honda', 'Nissan', 'Subaru', 'Mazda', 'Buick'],
                   'Miles': [1111, 2222, 3333, 4444, 5555, 6666, 7777, 8888]})

# Display df
display(df)

Goal

I would like for the output to be a multi-index df using 'Name', 'Address', and 'State' that would look as follows:

Name Address State Car Miles
Alice 123 A St AZ GMC 1111
Honda 4444
Alice 456 X St AZ Subaru 6666
Bob 123 B St TX Mazda 2222
Carol 123 C St CA Tesla 3333
Nissan 5555
Mazda 7777
Matt 123 M St MA Buick 8888

Code attempted

I tried the following code, but it does not group all rows of data with the same multi-index values:

df = df.set_index(keys=['Name', 'Address', 'State'])

Thanks for any help you can provide.


Solution

  • You can sort your index after setting it, however you will end up with all the Alice values grouped together:

    df.set_index(keys=['Name', 'Address', 'State']).sort_index()
    

    Output:

                             Car  Miles
    Name  Address  State
    Alice 123 A St AZ        GMC   1111
                   AZ      Honda   4444
          456 X St AZ     Subaru   6666
    Bob   123 B St TX      Mazda   2222
    Carol 123 C St CA      Tesla   3333
                   CA     Nissan   5555
                   CA      Mazda   7777
    Matt  123 M St MA      Buick   8888
    

    To get something close to the result you want, you need to index on ['Address', 'State', 'Name'] and then sort on the Name first:

    df.set_index(keys=['Address', 'State', 'Name']).sort_index(level=2)
    

    Output:

                             Car  Miles
    Address  State Name
    123 A St AZ    Alice     GMC   1111
                   Alice   Honda   4444
    456 X St AZ    Alice  Subaru   6666
    123 B St TX    Bob     Mazda   2222
    123 C St CA    Carol   Tesla   3333
                   Carol  Nissan   5555
                   Carol   Mazda   7777
    123 M St MA    Matt    Buick   8888