I'm getting an error when I try to create a new calculated column by dividing two numbers by the total sum (1+1/3+2+1+1).
The error is: ValueError: cannot join with no overlapping index names
This is currently what I have:
lvl0 a b
lvl1 bar foo bar foo
lvl2 fake real fake real fake real fake real
A0 B0 C0 D0 2 3 0 1 6 7 4 5
D1 10 11 8 9 14 15 12 13
C1 D0 18 19 16 17 22 23 20 21
D1 26 27 24 25 30 31 28 29
C2 D0 34 35 32 33 38 39 36 37
D1 42 43 40 41 46 47 44 45
C3 D0 50 51 48 49 54 55 52 53
D1 58 59 56 57 62 63 60 61
first = dfmi['a', 'foo', 'fake'] + dfmi['a', 'bar', 'fake']
dfmi = dfmi.loc[:, ('a', 'Fake %', 'Fake Calculation')] = first.div(dfmi.sum(axis=1, level=0), level=0)
You should be able to copy and paste to test
# creating MultiIndex
def mklbl(prefix, n):
return ["%s%s" % (prefix, i) for i in range(n)]
miindex = pd.MultiIndex.from_product([mklbl('A', 4),
mklbl('B', 2),
mklbl('C', 4),
mklbl('D', 2)])
micolumns = pd.MultiIndex.from_tuples([('a', 'foo', 'fake'),
('a', 'foo', 'real'),
('a', 'bar', 'fake'),
('a', 'bar', 'real'),
('b', 'foo', 'fake'),
('b', 'foo', 'real'),
('b', 'bar', 'fake'),
('b', 'bar', 'real'),
],
names=['lvl0', 'lvl1', 'lvl2'])
dfmi = pd.DataFrame(np.arange(len(miindex) * len(micolumns))
.reshape((len(miindex), len(micolumns))),
index=miindex,
columns=micolumns).sort_index().sort_index(axis=1)
# My Code
first = dfmi['a', 'foo', 'fake'] + dfmi['a', 'bar', 'fake']
dfmi = dfmi.loc[:, ('a', 'Fake %', 'Fake Calculation')] = first.div(dfmi.sum(axis=1, level=0), level=0)
This is what I want it to look like:
lvl0 a b
lvl1 bar foo bar foo
lvl2 fake real fake real CALCULATE fake real fake real
A0 B0 C0 D0 2 3 0 1 33% 6 7 4 5
D1 10 11 8 9 47% 14 15 12 13
C1 D0 18 19 16 17 ETC 22 23 20 21
D1 26 27 24 25 ETC 30 31 28 29
C2 D0 34 35 32 33 ETC 38 39 36 37
D1 42 43 40 41 ETC 46 47 44 45
C3 D0 50 51 48 49 ETC 54 55 52 53
D1 58 59 56 57 ETC 62 63 60 61
The calculate column is just (fake+fake)/(fake+real+fake+real) in the rows.
You can try with
df.sum(level=[0,2],axis=1).loc[:,pd.IndexSlice[:,'fake']].div(df.sum(level=0,axis=1),level=0,axis=0)