I'm looking for a clean way to add a column to a multiindex dataframe, where the value is only repeated once per level=0.
For example,
I want to add a column to this:
Index level=0 | Index level=1 | Value (x) |
---|---|---|
A | 1 | 300 |
2 | 850 | |
3 | 2000 | |
B | 1 | 100 |
2 | 70 | |
3 | 400 |
In order to get to this:
Index level=0 | Index level=1 | Value (x) | Value (y) |
---|---|---|---|
A | 1 | 300 | Yellow |
2 | 850 | ||
3 | 2000 | ||
B | 1 | 100 | Red |
2 | 70 | ||
3 | 400 |
I do NOT want this:
Index level=0 | Index level=1 | Value (x) | Value (y) |
---|---|---|---|
A | 1 | 300 | Yellow |
2 | 850 | Yellow | |
3 | 2000 | Yellow | |
B | 1 | 100 | Red |
2 | 70 | Red | |
3 | 400 | Red |
I'm not sure how best to create a table here that shows what I'm hoping for, but the important part to me is that y corresponds to all rows of index level=0, but is not repeated for every increment of index level=1. I'm sure that I could the additional rows in the y column with null values but I thought there might be a more elegant way.
Use pd.IndexSlice
:
idx = pd.IndexSlice
df.loc[idx[:, 1], 'Color'] = ['Yellow', 'Red']
print(df)
# Output
Value Color
A 1 300 Yellow
2 850 NaN
3 2000 NaN
B 1 100 Red
2 70 NaN
3 400 NaN
Or only with slice
:
df.loc[(slice(None), 1), 'Color'] = ['Yellow', 'Red']
print(df)
# Output
Value Color
A 1 300 Yellow
2 850 NaN
3 2000 NaN
B 1 100 Red
2 70 NaN
3 400 NaN