Search code examples
pandasdataframemeanhierarchical-datamulti-index

How to set different arrays as minor index for dataframe with pandas multiindex


Two q's:

1) Is it possible to create a MultiIndex Pandas DataFrame with different "minor" indices e.g.:

   Col1   Col2   
0
    a  0.1    0.01
    b  0.2    0.02
    c  0.3    0.03
1
    m  0.8    0.00
    n  0.9    0.01
    v  0.7    0.10

When using Pandas MultiIndex I can only manage to set the the same minor index for all major indices. I wish to know if there is a way to specify different arrays, all of the same length, as minor indices?

2) Say the minor indices (a, b, c, m, n, v) were floats. Is there a way to use pandas mean method to average these values? Until now I can only average data that are not specified as indices.

Thanks!


Solution

  • Yes and yes.

    Create Multilevel index data:

    array = list(zip([0]*3,list('abc')))+list(zip([1]*5,list('vwxyz')))
    array
    

    output:

    [(0, 'a'),
     (0, 'b'),
     (0, 'c'),
     (1, 'v'),
     (1, 'w'),
     (1, 'x'),
     (1, 'y'),
     (1, 'z')]
    

    Use pd.MutliIndex to create index and create a dataframe:

    idx = pd.MultiIndex.from_tuples(array,names=['one','two'])
    df = pd.DataFrame({'Col1':np.random.random(8),'Col2':np.random.random(8)*10},index=idx)
    
    print(df)
    

    Output:

                 Col1      Col2
    one two                    
    0   a    0.747933  3.191390
        b    0.020055  1.726661
        c    0.342344  5.595333
    1   v    0.298349  5.136354
        w    0.445190  3.952943
        x    0.921896  7.905128
        y    0.782851  0.132475
        z    0.259996  9.938946
    

    Do overall mean:

    df.mean()
    

    output:

    Col1    0.477327
    Col2    4.697404
    dtype: float64
    

    Do mean by 'one':

    print(df.groupby(level=0).mean())
    

    Output:

             Col1      Col2
    one                    
    0    0.370111  3.504461
    1    0.541656  5.413169