python pandas dataframe group-by aggregate

Grouping pandas data frame columns according to an external vector

I am trying to calculate row sums of a pandas data frame for columns groupped according to an external vector. For example, for a matrix

matrix = pd.DataFrame(np.array( [[1,2,3,4,5], [10,20,30,40,50],[100,200,300,400,500]  ]  ))
matrix.columns =  ["T1", "T3", "T1", "T1", "T3"]
print(matrix)
    T1   T3   T1   T1   T3
0    1    2    3    4    5
1   10   20   30   40   50
2  100  200  300  400  500

the desired output is:

    T1  T2   T3
0    8   0    7
1   80   0   70
2  800   0  700

Therefore, I would like to group columns by a potentially broader vector that columns names (matrix.columns).

I obviosly tried:

matrix.groupby(matrix.columns, axis=1).sum()
    T1   T3
0    8    7
1   80   70
2  800  700

but this yields a differrent result.

I also tried:

matrix.groupby(["T1", "T2", "T3"], axis=1).sum()

but this results in an error.

Solution

because T2 does not exist in matrix.columns one method would be to add it.

It could either be added as a new zero value column in the begining.

Alternatively, it could be added at the end (after the groupby is done).

here is an example of adding it after the groupby is done:

import pandas as pd
import numpy as np

matrix = pd.DataFrame(np.array( [[1,2,3,4,5], [10,20,30,40,50],[100,200,300,400,500]  ]  ))
matrix.columns =  ["T1", "T3", "T1", "T1", "T3"]
print(matrix)

z = matrix.groupby(matrix.columns, axis=1).sum()
print(z)

columns_to_add = []
columns_you_want = ["T1", "T2", "T3"]
for c in columns_you_want:
    if c not in matrix.columns:
        # print('we want this', c)
        columns_to_add.append(c)

print('columns to add:', columns_to_add)

for c in columns_to_add:
    # Add a new column with default value 0
    print('adding', c)
    z[c] = 0
    

print(z)

the result is this:

    T1   T3   T1   T1   T3
0    1    2    3    4    5
1   10   20   30   40   50
2  100  200  300  400  500
    T1   T3
0    8    7
1   80   70
2  800  700
columns to add: ['T2']
adding T2
    T1   T3  T2
0    8    7   0
1   80   70   0
2  800  700   0