I have a dataframe:
a b val1_b1 val1_b2 val2_b1 val2_v2
1 2 5 9 4 6
I want to take the max by column group, so the dataframe will be:
a b val1 val2
1 2 9 6
or the RMS:
a b val1 val2
1 2 sqrt(106) sqrt(52)
What is the best way to do so?
You can use max
with DataFrame.groupby
by columns, axis=1
and lambda function fors values before first _
(if exist):
df1 = df.groupby(lambda x: x.split('_')[0], axis=1).max()
print (df1)
a b val1 val2
0 1 2 9 6
EDIT for RMS
use:
df2 = df.iloc[:, :2]
df3 = df.iloc[:, 2:].pow(2).groupby(lambda x: x.split('_')[0], axis=1).sum().pow(1./2)
#alternative
df3 = (df.iloc[:, 2:].groupby(lambda x: x.split('_')[0], axis=1)
.apply(lambda a: pd.Series(np.linalg.norm(a))))
df4 = pd.concat([df2, df3], axis=1)
print (df4)
a b val1 val2
0 1 2 10.29563 7.211103