I’m trying to get from a
to b
. I got a Pandas data frame similar to the a below.
data={'col1':['N1','N1','N2','N2', 'N2','N3'],
'col2':['DE','NO','DE','NO', 'IT','DE'],
'col3':[7, 5, 4, 1, 2, 8],
'col3_sum':[12, 12, 7, 7, 7, 8],
'col4':[0.6, 0.2, 0.7, 0.1, 0.2, 0.6],
'col4_sum':[0.8, 0.8, 1.0, 1.0, 1.0, 0.6],
'col5':[1,2,3,4,5,6]}
a=pd.DataFrame(data)
print(a)
col1 col2 col3 col3_sum col4 col4_sum col5
0 N1 DE 7 12 0.6 0.8 1
1 N1 NO 5 12 0.2 0.8 2
2 N2 DE 4 7 0.7 1.0 3
3 N2 NO 1 7 0.1 1.0 4
4 N2 IT 2 7 0.2 1.0 5
5 N3 DE 8 8 0.6 0.6 6
I realize I’ve backed myself into a corner by computing sums in a flat file. I’m new to Python. I guess I should create the sums when I'm done pivoting?
What I am stuck in is this wrong b
struggle,
b = df.pivot_table(index=['col1'],
values=['col3', 'col3_sum','col4', 'col4_sum'],
columns='col2')
# or
b = pd.pivot_table(a,index=['col1', 'col2'], columns=['col3', 'col4'],
aggfunc=len).reset_index()
# this makes senst to me, but not to python
a.pivot(index=['col1', 'col2'], columns='col2', values=['col3', 'col4'])
# print(b) # where I'm stuck at ...
I would like to get to something like this b
,
print(b) # my goal
col1 var var_sum DE NO IT
N1 col3 12 7 5
N1 col4 0.8 0.6 0.2
N2 col3 7 4 1 2
N2 col4 1.0 0.7 0.1 0.2
N3 col3 8 8
N3 col4 0.6 0.6
I'm not sure what to search for (some of the maybe relevant questions has way too much complexity for my to be able to extract what I need, at least at the moment). I've looked a lot at this answer, maybe I should find a way using .groupby()
Maybe you can compute the sum afterwards:
out = pd.melt(a, ["col1", "col2"], ["col3", "col4"]).pivot(
["col1", "variable"], "col2"
)
out["var_sum"] = out.sum(axis=1)
out = out.reset_index()
out.index.name, out.columns.name = None, None
out.columns = [
f"{a}_{b}".replace("value", "").strip("_") for a, b in out.columns
]
print(out)
Prints:
col1 variable DE IT NO var_sum
0 N1 col3 7.0 NaN 5.0 12.0
1 N1 col4 0.6 NaN 0.2 0.8
2 N2 col3 4.0 2.0 1.0 7.0
3 N2 col4 0.7 0.2 0.1 1.0
4 N3 col3 8.0 NaN NaN 8.0
5 N3 col4 0.6 NaN NaN 0.6