Summing columns of Pandas dataframe in a systematic way

I have a pandas dataframe which looks like this:

1_2 1_3 1_4 2_3 2_4 3_4
1   5   2   8   2   2
4   3   4   5   8   5
8   8   8   9   3   3
4   3   4   4   8   3
8   0   7   4   2   2

where the columns are the 4C2 combinations of 1,2,3,4. And I would like to generate 4 new columns f_1, f_2, f_3, f_4 where the values of the columns are defined to be

df['f_1'] = df['1_2']+df['1_3']+df['1_4']
df['f_2'] = df['1_2']+df['2_3']+df['2_4']
df['f_3'] = df['1_3']+df['2_3']+df['3_4']
df['f_4'] = df['1_4']+df['2_4']+df['3_4']

In other words, the column f_i are defined to be the sum of columns i_j and k_i.

So I can brute force my way in this case. However, my original dataframe is a lot bigger and there are 20C2 = 190 columns instead and hence a brute force method wouldn't work.

So the desired outcome looks like

1_2 1_3 1_4 2_3 2_4 3_4 f_1 f_2 f_3 f_4
1   5   2   8   2   2   8   11  15  6
4   3   4   5   8   5   11  17  13  17
8   8   8   9   3   3   24  20  20  14
4   3   4   4   8   3   11  16  10  15
8   0   7   4   2   2   15  14  6   11

Thank you so much.

Solution

Build a dictionary of the columns with str.split+explode+Index.groupby, and process them in a loop:

s = df.columns.to_series().str.split('_').explode()
d = s.index.groupby(s)

for k, v in d.items():
    df[f'f_{k}'] = df[v].sum(axis=1)

You could also use eval instead of the loop once you have d:

query = '\n'.join(f'f_{k} = {"+".join(map("`{}`".format, v))}'
                  for k,v in d.items())
out = df.eval(query)

Output:

   1_2  1_3  1_4  2_3  2_4  3_4  f_1  f_2  f_3  f_4
0    1    5    2    8    2    2    8   11   15    6
1    4    3    4    5    8    5   11   17   13   17
2    8    8    8    9    3    3   24   20   20   14
3    4    3    4    4    8    3   11   16   10   15
4    8    0    7    4    2    2   15   14    6   11

Intermediate d:

{'1': ['1_2', '1_3', '1_4'],
 '2': ['1_2', '2_3', '2_4'],
 '3': ['1_3', '2_3', '3_4'],
 '4': ['1_4', '2_4', '3_4'],
}

Pure python approach to build d:

d = {}
for c in df:
    for k in c.split('_'):
        d.setdefault(k, []).append(c)

You could also imagine a pure pandas approach based on reshaping with melt+pivot_table, but this is most likely much less efficient:

out = df.join(df
   .set_axis(df.columns.str.split('_'), axis=1)
   .melt(ignore_index=False).explode('variable')
   .reset_index()
   .pivot_table(index='index', columns='variable',
                values='value', aggfunc='sum')
   .add_prefix('f_')
)