Search code examples
pythonpandasmulti-index

How to dynamically add an arbitrary number of values to a specific level in a pandas multiindex using python?


I have this dataframe with a MultiIndex object as its index.

                                         apple
level_1 level_2 level_3 level_4 level_5
bga     B       G       1       0          5.0

How can I dynamically add a random amount of values to level 5?

I have this:

randData = ["r1", "r13", "r14", "r6"]
for x in randData:
    df.loc[( "bga", "B", "G", "1", x ), "apple"] = np.random.randn(1)[0]

Which works great but for lists larger than 4 items takes forever. I was thinking of a simple slice of something like this:

df.loc[( "bga", "B", "G", "1" ), "apple"] = (randData, np.random.randn(4))

But it doesn't like that (which makes sense) and I can't think of any other way to slice it that would allow me to add values to the level.

I have thought of simply recreating the MultiIndex and setting the new index, but thats a bit more work than I was looking for with the slicing option.

Does anyone else have any thoughts? TIA!


Solution

  • The MultiIndex seems to be slowing you down. Internally we are to think of these as tuples (I believe). Since here you want to do some parallelisation I would suggest to .reset_index so the levels become columns. Then you can fuss around there.

    I have used the pd.concat which fills with NaN those entries which we do not specify. Then can use the fillna with ffill to copy down the entries from above.

    Code:

    x = pd.DataFrame({'apple': 5.0}, index=pd.MultiIndex.from_tuples([( "bga", "B", "G", 1, 0)]))
    print(x)
    
    y = x.reset_index()
    print(y)
    
    z = pd.DataFrame({'level_4': ['r1', 'r13', 'r14', 'r16'], 'apple': np.random.randn(4)})
    print(z)
    
    w = pd.concat([y, z])
    print(w)
    
    u = w.fillna(method='ffill')
    print(u)
    
    v = u.set_index(['level_0', 'level_1', 'level_2', 'level_3', 'level_4'])
    print(v)
    

    Output:

                 apple
    bga B G 1 0    5.0
      level_0 level_1 level_2  level_3  level_4  apple
    0     bga       B       G        1        0    5.0
      level_4     apple
    0      r1  0.286195
    1     r13 -0.039496
    2     r14 -0.571542
    3     r16  1.196218
      level_0 level_1 level_2  level_3 level_4     apple
    0     bga       B       G      1.0       0  5.000000
    0     NaN     NaN     NaN      NaN      r1  0.286195
    1     NaN     NaN     NaN      NaN     r13 -0.039496
    2     NaN     NaN     NaN      NaN     r14 -0.571542
    3     NaN     NaN     NaN      NaN     r16  1.196218
      level_0 level_1 level_2  level_3 level_4     apple
    0     bga       B       G      1.0       0  5.000000
    0     bga       B       G      1.0      r1  0.286195
    1     bga       B       G      1.0     r13 -0.039496
    2     bga       B       G      1.0     r14 -0.571542
    3     bga       B       G      1.0     r16  1.196218
                                                apple
    level_0 level_1 level_2 level_3 level_4          
    bga     B       G       1.0     0        5.000000
                                    r1       0.286195
                                    r13     -0.039496
                                    r14     -0.571542
                                    r16      1.196218
    

    Edit: timing comparison:

    timings, 100X faster my way