I have a df with multi index in both rows and cols, and I want to reset_index on both index and cols so that all the mulitindices are new columns. Below is an example of what I have and what I want.
What I have:
indexAarrays = [['bar', 'bar', 'baz', 'baz', ],
['one', 'two', 'one', 'two']]
indexTuples = list(zip(*indexAarrays))
index = pd.MultiIndex.from_tuples(indexTuples, names=['firstIndex', 'secondIndex'])
colAarrays = [['c1', 'c1', 'c2', 'c2', ],
['d1', 'd2', 'd1', 'd2']]
colTuples = list(zip(*colAarrays ))
col = pd.MultiIndex.from_tuples(colTuples, names=['firstCol', 'secondCol'])
df = pd.DataFrame(data=np.random.random_sample((len(index), len(col))),
index=index, columns=col)
df
The above gives the DF i have:
firstCol c1 c2
secondCol d1 d2 d1 d2
firstIndex secondIndex
bar one 0.231221 0.846196 0.037493 0.516474
two 0.810847 0.204095 0.423766 0.513262
baz one 0.433040 0.118018 0.267039 0.356261
two 0.529042 0.181886 0.093488 0.643357
What I want:
wantedCols = [idxName for idxName in index.names] \
+ [colName for colName in col.names]\
+ ['Value']
dfWanted = pd.DataFrame(index = range(int(df.shape[0]*df.shape[1]/(len(wantedCols)-1))),
columns=wantedCols)
idxCounter = 0
for idx1 in df.index.get_level_values(0).unique():
for idx2 in df.index.get_level_values(1).unique():
for c1 in df.columns.get_level_values(0).unique():
for c2 in df.columns.get_level_values(1).unique():
dfWanted.loc[idxCounter, 'firstIndex'] = idx1
dfWanted.loc[idxCounter, 'secondIndex'] = idx2
dfWanted.loc[idxCounter, 'firstCol'] = c1
dfWanted.loc[idxCounter, 'secondCol'] = c2
dfWanted.loc[idxCounter, 'Value'] = df.loc[(idx1, idx2), (c1, c2)]
idxCounter += 1
dfWanted
The above gives the DF I want:
firstIndex secondIndex firstCol secondCol Value
0 bar one c1 d1 0.231221
1 bar one c1 d2 0.846196
2 bar one c2 d1 0.037493
3 bar one c2 d2 0.516474
4 bar two c1 d1 0.810847
5 bar two c1 d2 0.204095
6 bar two c2 d1 0.423766
7 bar two c2 d2 0.513262
8 baz one c1 d1 0.43304
9 baz one c1 d2 0.118018
10 baz one c2 d1 0.267039
11 baz one c2 d2 0.356261
12 baz two c1 d1 0.529042
13 baz two c1 d2 0.181886
14 baz two c2 d1 0.0934878
15 baz two c2 d2 0.643357
Does anyone know of an easier way to reset the indices than the method I used above?
Use DataFrame.stack
by both levels and then convert MultiIndex Series
to columns by Series.reset_index
:
df = df.stack([0,1]).reset_index(name='Value')
print (df)
firstIndex secondIndex firstCol secondCol Value
0 bar one c1 d1 0.746027
1 bar one c1 d2 0.622784
2 bar one c2 d1 0.613197
3 bar one c2 d2 0.449560
4 bar two c1 d1 0.560810
5 bar two c1 d2 0.125046
6 bar two c2 d1 0.147148
7 bar two c2 d2 0.622862
8 baz one c1 d1 0.537280
9 baz one c1 d2 0.801410
10 baz one c2 d1 0.889445
11 baz one c2 d2 0.226477
12 baz two c1 d1 0.100759
13 baz two c1 d2 0.279383
14 baz two c2 d1 0.041767
15 baz two c2 d2 0.739942