I am trying to calculate row sums of a pandas data frame for columns groupped according to an external vector. For example, for a matrix
matrix = pd.DataFrame(np.array( [[1,2,3,4,5], [10,20,30,40,50],[100,200,300,400,500] ] ))
matrix.columns = ["T1", "T3", "T1", "T1", "T3"]
print(matrix)
T1 T3 T1 T1 T3
0 1 2 3 4 5
1 10 20 30 40 50
2 100 200 300 400 500
the desired output is:
T1 T2 T3
0 8 0 7
1 80 0 70
2 800 0 700
Therefore, I would like to group columns by a potentially broader vector that columns names (matrix.columns
).
I obviosly tried:
matrix.groupby(matrix.columns, axis=1).sum()
T1 T3
0 8 7
1 80 70
2 800 700
but this yields a differrent result.
I also tried:
matrix.groupby(["T1", "T2", "T3"], axis=1).sum()
but this results in an error.
because T2
does not exist in matrix.columns
one method would be to add it.
It could either be added as a new zero value column
in the begining.
Alternatively, it could be added at the end (after the groupby
is done).
here is an example of adding it after the groupby is done:
import pandas as pd
import numpy as np
matrix = pd.DataFrame(np.array( [[1,2,3,4,5], [10,20,30,40,50],[100,200,300,400,500] ] ))
matrix.columns = ["T1", "T3", "T1", "T1", "T3"]
print(matrix)
z = matrix.groupby(matrix.columns, axis=1).sum()
print(z)
columns_to_add = []
columns_you_want = ["T1", "T2", "T3"]
for c in columns_you_want:
if c not in matrix.columns:
# print('we want this', c)
columns_to_add.append(c)
print('columns to add:', columns_to_add)
for c in columns_to_add:
# Add a new column with default value 0
print('adding', c)
z[c] = 0
print(z)
the result is this:
T1 T3 T1 T1 T3
0 1 2 3 4 5
1 10 20 30 40 50
2 100 200 300 400 500
T1 T3
0 8 7
1 80 70
2 800 700
columns to add: ['T2']
adding T2
T1 T3 T2
0 8 7 0
1 80 70 0
2 800 700 0