This has probably already been answered in several places, but I still cannot get my example to work without resorting to several lines of code.
I have a dataframe with multiindex columns, and I would like to add a calculated column to to every level-0 group.
arrays = [
np.array(["bar", "bar", "baz", "baz", "foo", "foo"]),
np.array(["one", "two", "one", "two", "one", "two"]),
]
df = pd.DataFrame(np.linspace(0, 4*6-1, 4*6).reshape(4,6), columns=arrays)
print(df)
bar baz foo
one two one two one two
0 0 1 2 3 4 5
1 6 7 8 9 10 11
2 12 13 14 15 16 17
3 18 19 20 21 22 23
Specifically, I want to add a column division
to bar
, baz
and foo
which divides element-wise one
and two
.
The way I am doing it now is this:
# divide the numbers of the two dataframes (have to drop the first level)
idx = pd.IndexSlice
div_res = df.loc[:, idx[:, "one"]].droplevel(1, axis=1) / df.loc[
:, idx[:, "two"]
].droplevel(1, axis=1)
div_res.columns = pd.MultiIndex.from_tuples([(y, "division") for y in div_res.columns])
# and concatenate the result again
result = (
pd.concat([df, div_res], axis=1).sort_index(
level=[0, 1],
axis=1,
)
)
print(result)
bar baz foo
division one two division one two division one two
0 0.0 0.0 1.0 0.7 2.0 3.0 0.8 4.0 5.0
1 0.9 6.0 7.0 0.9 8.0 9.0 0.9 10.0 11.0
2 0.9 12.0 13.0 0.9 14.0 15.0 0.9 16.0 17.0
3 0.9 18.0 19.0 1.0 20.0 21.0 1.0 22.0 23.0
This works, but it seems to me that there should be a more efficient way to do this.
Thanks a bunch!
As a bonus, I have not figured out how to sort the second column level: one
, two
, division
.
Let us try reshaping with stack
and unstack
s = df.stack(0)
s['div'] = s['one'].div(s['two'])
s.stack().unstack([1, 2])
bar baz foo
one two div one two div one two div
0 0.0 1.0 0.000000 2.0 3.0 0.666667 4.0 5.0 0.800000
1 6.0 7.0 0.857143 8.0 9.0 0.888889 10.0 11.0 0.909091
2 12.0 13.0 0.923077 14.0 15.0 0.933333 16.0 17.0 0.941176
3 18.0 19.0 0.947368 20.0 21.0 0.952381 22.0 23.0 0.956522