Search code examples
pythonpandasdataframemulti-index

Vectorized method to fill dataframe column from indices from another Multiindexed dataframe?


Say I have a multiindexed dataframe df1:

        x   y
i0  i1      
aaa a   1   6
    b   2   5
    c   3   4
bbb x   4   3
    y   5   2
    z   6   1

with a second dataframe df2:

       foo         bar      cat
0   0.925481    0.682102    aaa
1   0.620557    0.375114    aaa
2   0.788425    0.746436    bbb
3   0.247594    0.902258    aaa
4   0.146786    0.048585    bbb
5   0.256904    0.327165    bbb

and I want to use a list of second level indices of the first dataframe to fill a column in the second dataframe df2 as an array like this:

       foo         bar      cat      catlist
0   0.925481    0.682102    aaa   ['a','b','c']
1   0.620557    0.375114    aaa   ['a','b','c']
2   0.788425    0.746436    bbb   ['x','y','z']
3   0.247594    0.902258    aaa   ['a','b','c']
4   0.146786    0.048585    bbb   ['x','y','z']
5   0.256904    0.327165    bbb   ['x','y','z']

I can do this (slowly) by iterating, but how should this be done using a vectorized pandas method?

Iterating using apply :

def iterate(row):
    return list(df1.loc[row['cat']].index.get_level_values(0))

df2['catlist'] = df2.apply(iterate, axis=1)

Solution

  • You can use convert your MultiIndex to dataframe using pd.MultiIndex.to_frame, then gropuby and use GroupBy.agg to list. Now that mapping in pd.Series.map

    mapping = df1.index.to_frame(index=False).groupby("i0")["i1"].agg(list)
    df2["catlist"] = df2["cat"].map(mapping)
    
            foo       bar  cat    catlist
    0  0.925481  0.682102  aaa  [a, b, c]
    1  0.620557  0.375114  aaa  [a, b, c]
    2  0.788425  0.746436  bbb  [x, y, z]
    3  0.247594  0.902258  aaa  [a, b, c]
    4  0.146786  0.048585  bbb  [x, y, z]
    5  0.256904  0.327165  bbb  [x, y, z]