I have a df that looks like this:
import pandas as pd
# Create df
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Carol', 'Alice', 'Carol', 'Alice', 'Carol', 'Matt'],
'Address': ['123 A St', '123 B St', '123 C St', '123 A St', '123 C St', '456 X St', '123 C St', '123 M St'],
'State': ['AZ', 'TX', 'CA', 'AZ', 'CA', 'AZ', 'CA', 'MA'],
'Car': ['GMC', 'Mazda', 'Tesla', 'Honda', 'Nissan', 'Subaru', 'Mazda', 'Buick'],
'Miles': [1111, 2222, 3333, 4444, 5555, 6666, 7777, 8888]})
# Display df
display(df)
Goal
I would like for the output to be a multi-index df using 'Name', 'Address', and 'State' that would look as follows:
Name | Address | State | Car | Miles |
---|---|---|---|---|
Alice | 123 A St | AZ | GMC | 1111 |
Honda | 4444 | |||
Alice | 456 X St | AZ | Subaru | 6666 |
Bob | 123 B St | TX | Mazda | 2222 |
Carol | 123 C St | CA | Tesla | 3333 |
Nissan | 5555 | |||
Mazda | 7777 | |||
Matt | 123 M St | MA | Buick | 8888 |
Code attempted
I tried the following code, but it does not group all rows of data with the same multi-index values:
df = df.set_index(keys=['Name', 'Address', 'State'])
Thanks for any help you can provide.
You can sort your index after setting it, however you will end up with all the Alice
values grouped together:
df.set_index(keys=['Name', 'Address', 'State']).sort_index()
Output:
Car Miles
Name Address State
Alice 123 A St AZ GMC 1111
AZ Honda 4444
456 X St AZ Subaru 6666
Bob 123 B St TX Mazda 2222
Carol 123 C St CA Tesla 3333
CA Nissan 5555
CA Mazda 7777
Matt 123 M St MA Buick 8888
To get something close to the result you want, you need to index on ['Address', 'State', 'Name']
and then sort on the Name
first:
df.set_index(keys=['Address', 'State', 'Name']).sort_index(level=2)
Output:
Car Miles
Address State Name
123 A St AZ Alice GMC 1111
Alice Honda 4444
456 X St AZ Alice Subaru 6666
123 B St TX Bob Mazda 2222
123 C St CA Carol Tesla 3333
Carol Nissan 5555
Carol Mazda 7777
123 M St MA Matt Buick 8888