I have a MultiIndexed DataFrame df1, and would like to loop over it in such a way as to in each instance of the loop have a DataFrame with a regular non-hierarchical index which is the subset of df1 corresponding to the outer index entries. I.e., if i have:
I want to get
and subsequently C1, C2, etc. I also don't know what the names of these will actually be (C1, etc., just being placeholders here), so would just like to loop over the number of Ci values I have.
I have been stumbling around with iterrows
and various loops and not getting any tangible results and don't really know how to proceed. I feel like a simple solution should exist but couldn't find anything that looked helpful in the documentation, probably due to my own lack of understanding.
Using a modified example from here
In [30]: def mklbl(prefix,n):
return ["%s%s" % (prefix,i) for i in range(n)]
....:
In [31]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
('b','foo'),('b','bah')],
names=['lvl0', 'lvl1'])
In [33]: index = MultiIndex.from_product([mklbl('A',4),mklbl('B',2)])
In [34]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
index=index,
columns=columns).sortlevel().sortlevel(axis=1)
In [35]: df
Out[35]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 1 0 3 2
B1 5 4 7 6
A1 B0 9 8 11 10
B1 13 12 15 14
A2 B0 17 16 19 18
B1 21 20 23 22
A3 B0 25 24 27 26
B1 29 28 31 30
In [36]: df.loc['A0']
Out[36]:
lvl0 a b
lvl1 bar foo bah foo
B0 1 0 3 2
B1 5 4 7 6
In [37]: df.loc['A1']
Out[37]:
lvl0 a b
lvl1 bar foo bah foo
B0 9 8 11 10
B1 13 12 15 14
No looping is necessary.
You can also select these in order to return a frame (with the original MI)
e.g. df.loc[['A1']]
If you want to get the values in the index:
In [38]: df.index.get_level_values(0).unique()
Out[38]: array(['A0', 'A1', 'A2', 'A3'], dtype=object)