Search code examples
pythonpandasmulti-index

Is a pandas MultiIndex a counterpart to a SQL composite index?


I posted this on reddit some days ago, but haven't received any response.

Everything I've read online about the pandas MultiIndex makes it seem like a counterpart to a SQL composite index. Is this the correct understanding?

Additionally, MultiIndex is often described as hierarchical. This disrupts the analogy with a composite index. To me, that means a tree structure, with parent keys and child keys, possibly with a depth greater than 2. A composite index doesn't fit this picture. In the case of MultiIndexes, what are the parent/child keys?


Solution

  • I think you almost answered your question. The key difference between a SQL composite index and a MultiIndex in pandas is the hierarchy with lower-level indexes having more priority.

    Consider the following example:

    import pandas as pd
    import numpy as np
    
    multi_index = pd.MultiIndex.from_product([['East', 'West'], ['Retail', 'Corporate'], ['Shirts', 'Pants', 'Accessories']],
                                             names=['Region', 'Customer', 'Product'])
    
    df = pd.DataFrame(np.random.randn(12, 2), index=multi_index, columns=['Revenue', 'Profit'])
    print(df)
    

    Output

                                      Revenue     Profit
    Region Customer Product                               
    East   Retail   Shirts            0.469112 -0.282863
                    Pants            -1.509059 -1.135632
                    Accessories      -1.135632 -0.173215
          Corporate Shirts            0.119209 -1.044236
                    Pants            -0.861849 -2.104569
                    Accessories      -0.721189  0.265599
    West  Retail    Shirts           -0.315196  0.854109
                    Pants             0.572192 -0.273712
                    Accessories       0.228440  0.574273
          Corporate Shirts           -0.284819 -0.671225
                    Pants             0.146182  0.056439
                    Accessories       0.423380  0.539674
    

    In this example, we have three levels in the MultiIndex:

    1. The parent level is 'Region' with keys 'East' and 'West'.
    2. The second level is 'Customer' with keys 'Retail' and 'Corporate', nested under each 'Region'.
    3. The child level is 'Product' with keys 'Shirts', 'Pants', and 'Accessories', nested under each combination of 'Region' and 'Customer'.