Say I have a multiindexed dataframe df1
:
x y
i0 i1
aaa a 1 6
b 2 5
c 3 4
bbb x 4 3
y 5 2
z 6 1
with a second dataframe df2
:
foo bar cat
0 0.925481 0.682102 aaa
1 0.620557 0.375114 aaa
2 0.788425 0.746436 bbb
3 0.247594 0.902258 aaa
4 0.146786 0.048585 bbb
5 0.256904 0.327165 bbb
and I want to use a list of second level indices of the first dataframe to fill a column in the second dataframe df2
as an array like this:
foo bar cat catlist
0 0.925481 0.682102 aaa ['a','b','c']
1 0.620557 0.375114 aaa ['a','b','c']
2 0.788425 0.746436 bbb ['x','y','z']
3 0.247594 0.902258 aaa ['a','b','c']
4 0.146786 0.048585 bbb ['x','y','z']
5 0.256904 0.327165 bbb ['x','y','z']
I can do this (slowly) by iterating, but how should this be done using a vectorized pandas method?
Iterating using apply
:
def iterate(row):
return list(df1.loc[row['cat']].index.get_level_values(0))
df2['catlist'] = df2.apply(iterate, axis=1)
You can use convert your MultiIndex to dataframe using pd.MultiIndex.to_frame
, then gropuby and use GroupBy.agg
to list. Now that mapping in pd.Series.map
mapping = df1.index.to_frame(index=False).groupby("i0")["i1"].agg(list)
df2["catlist"] = df2["cat"].map(mapping)
foo bar cat catlist
0 0.925481 0.682102 aaa [a, b, c]
1 0.620557 0.375114 aaa [a, b, c]
2 0.788425 0.746436 bbb [x, y, z]
3 0.247594 0.902258 aaa [a, b, c]
4 0.146786 0.048585 bbb [x, y, z]
5 0.256904 0.327165 bbb [x, y, z]