Search code examples
pythonpandasmulti-index

Looping over a MultiIndex in pandas


I have a MultiIndexed DataFrame df1, and would like to loop over it in such a way as to in each instance of the loop have a DataFrame with a regular non-hierarchical index which is the subset of df1 corresponding to the outer index entries. I.e., if i have:

FirstTable

I want to get

SecondTable

and subsequently C1, C2, etc. I also don't know what the names of these will actually be (C1, etc., just being placeholders here), so would just like to loop over the number of Ci values I have.

I have been stumbling around with iterrows and various loops and not getting any tangible results and don't really know how to proceed. I feel like a simple solution should exist but couldn't find anything that looked helpful in the documentation, probably due to my own lack of understanding.


Solution

  • Using a modified example from here

    In [30]: def mklbl(prefix,n):
            return ["%s%s" % (prefix,i)  for i in range(n)]
       ....: 
    
    In [31]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
                                      ('b','foo'),('b','bah')],
                                       names=['lvl0', 'lvl1'])
    
    In [33]: index = MultiIndex.from_product([mklbl('A',4),mklbl('B',2)])
    
    In [34]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
                   index=index,
                   columns=columns).sortlevel().sortlevel(axis=1)
    
    In [35]: df
    Out[35]: 
    lvl0     a         b     
    lvl1   bar  foo  bah  foo
    A0 B0    1    0    3    2
       B1    5    4    7    6
    A1 B0    9    8   11   10
       B1   13   12   15   14
    A2 B0   17   16   19   18
       B1   21   20   23   22
    A3 B0   25   24   27   26
       B1   29   28   31   30
    
    In [36]: df.loc['A0']
    Out[36]: 
    lvl0    a         b     
    lvl1  bar  foo  bah  foo
    B0      1    0    3    2
    B1      5    4    7    6
    
    In [37]: df.loc['A1']
    Out[37]: 
    lvl0    a         b     
    lvl1  bar  foo  bah  foo
    B0      9    8   11   10
    B1     13   12   15   14
    

    No looping is necessary.

    You can also select these in order to return a frame (with the original MI) e.g. df.loc[['A1']]

    If you want to get the values in the index:

    In [38]: df.index.get_level_values(0).unique()
    Out[38]: array(['A0', 'A1', 'A2', 'A3'], dtype=object)