Search code examples
pythonpandasdataframedata-munging

Python pandas apply function on columns value (base on columns names patern)


I have a dataframe:

a b  val1_b1 val1_b2 val2_b1 val2_v2
1 2    5       9        4      6

I want to take the max by column group, so the dataframe will be:

a b  val1  val2 
1 2    9     6  

or the RMS:

a b       val1    val2
1 2    sqrt(106)   sqrt(52)   

What is the best way to do so?


Solution

  • You can use max with DataFrame.groupby by columns, axis=1 and lambda function fors values before first _ (if exist):

    df1 = df.groupby(lambda x: x.split('_')[0], axis=1).max()
    print (df1)
       a  b  val1  val2
    0  1  2     9     6
    

    EDIT for RMS use:

    df2 = df.iloc[:, :2]
    df3 = df.iloc[:, 2:].pow(2).groupby(lambda x: x.split('_')[0], axis=1).sum().pow(1./2)
    
    #alternative
    df3 = (df.iloc[:, 2:].groupby(lambda x: x.split('_')[0], axis=1)
             .apply(lambda a: pd.Series(np.linalg.norm(a))))
    
    df4 = pd.concat([df2, df3], axis=1)
    print (df4)
    
       a  b      val1      val2
    0  1  2  10.29563  7.211103