I have this dataframe
with a MultiIndex
object as its index
.
apple
level_1 level_2 level_3 level_4 level_5
bga B G 1 0 5.0
How can I dynamically add a random amount of values to level 5?
I have this:
randData = ["r1", "r13", "r14", "r6"]
for x in randData:
df.loc[( "bga", "B", "G", "1", x ), "apple"] = np.random.randn(1)[0]
Which works great but for lists larger than 4 items takes forever. I was thinking of a simple slice of something like this:
df.loc[( "bga", "B", "G", "1" ), "apple"] = (randData, np.random.randn(4))
But it doesn't like that (which makes sense) and I can't think of any other way to slice it that would allow me to add values to the level.
I have thought of simply recreating the MultiIndex
and setting the new index, but thats a bit more work than I was looking for with the slicing option.
Does anyone else have any thoughts? TIA!
The MultiIndex seems to be slowing you down. Internally we are to think of these as tuples (I believe). Since here you want to do some parallelisation I would suggest to .reset_index
so the levels become columns. Then you can fuss around there.
I have used the pd.concat
which fills with NaN
those entries which we do not specify. Then can use the fillna
with ffill
to copy down the entries from above.
Code:
x = pd.DataFrame({'apple': 5.0}, index=pd.MultiIndex.from_tuples([( "bga", "B", "G", 1, 0)]))
print(x)
y = x.reset_index()
print(y)
z = pd.DataFrame({'level_4': ['r1', 'r13', 'r14', 'r16'], 'apple': np.random.randn(4)})
print(z)
w = pd.concat([y, z])
print(w)
u = w.fillna(method='ffill')
print(u)
v = u.set_index(['level_0', 'level_1', 'level_2', 'level_3', 'level_4'])
print(v)
Output:
apple
bga B G 1 0 5.0
level_0 level_1 level_2 level_3 level_4 apple
0 bga B G 1 0 5.0
level_4 apple
0 r1 0.286195
1 r13 -0.039496
2 r14 -0.571542
3 r16 1.196218
level_0 level_1 level_2 level_3 level_4 apple
0 bga B G 1.0 0 5.000000
0 NaN NaN NaN NaN r1 0.286195
1 NaN NaN NaN NaN r13 -0.039496
2 NaN NaN NaN NaN r14 -0.571542
3 NaN NaN NaN NaN r16 1.196218
level_0 level_1 level_2 level_3 level_4 apple
0 bga B G 1.0 0 5.000000
0 bga B G 1.0 r1 0.286195
1 bga B G 1.0 r13 -0.039496
2 bga B G 1.0 r14 -0.571542
3 bga B G 1.0 r16 1.196218
apple
level_0 level_1 level_2 level_3 level_4
bga B G 1.0 0 5.000000
r1 0.286195
r13 -0.039496
r14 -0.571542
r16 1.196218
Edit: timing comparison: