I posted this on reddit some days ago, but haven't received any response.
Everything I've read online about the pandas MultiIndex makes it seem like a counterpart to a SQL composite index. Is this the correct understanding?
Additionally, MultiIndex is often described as hierarchical. This disrupts the analogy with a composite index. To me, that means a tree structure, with parent keys and child keys, possibly with a depth greater than 2. A composite index doesn't fit this picture. In the case of MultiIndexes, what are the parent/child keys?
I think you almost answered your question. The key difference between a SQL composite index and a MultiIndex in pandas is the hierarchy with lower-level indexes having more priority.
Consider the following example:
import pandas as pd
import numpy as np
multi_index = pd.MultiIndex.from_product([['East', 'West'], ['Retail', 'Corporate'], ['Shirts', 'Pants', 'Accessories']],
names=['Region', 'Customer', 'Product'])
df = pd.DataFrame(np.random.randn(12, 2), index=multi_index, columns=['Revenue', 'Profit'])
print(df)
Output
Revenue Profit
Region Customer Product
East Retail Shirts 0.469112 -0.282863
Pants -1.509059 -1.135632
Accessories -1.135632 -0.173215
Corporate Shirts 0.119209 -1.044236
Pants -0.861849 -2.104569
Accessories -0.721189 0.265599
West Retail Shirts -0.315196 0.854109
Pants 0.572192 -0.273712
Accessories 0.228440 0.574273
Corporate Shirts -0.284819 -0.671225
Pants 0.146182 0.056439
Accessories 0.423380 0.539674
In this example, we have three levels in the MultiIndex: