Search code examples
pythonpandasdataframemedian

Pandas: Column medians based on column names


I have the following pandas DataFrame.

df = pd.DataFrame(np.random.randn(3,6), columns=['A1','A2','A3','B1','B2','B3'])
df

           A1         A2          A3          B1          B2           B3
0   -0.409420   2.382457    1.151565    0.625461    0.224453    -0.351573
1   -0.676554   -1.485376   0.597227    0.240113    0.033963    1.224241
2   0.678698    1.392778    1.031625    0.388137    -0.566746   -0.798156

How do I get the median of the columns like this

        medA     medB
0        ...      ...
1        ...      ...
2        ...      ...   

My actual data frame has 300 columns, so I would like to differentiate by similarity in column name.


Solution

  • this looks like pd.wide_to_long:

    (pd.wide_to_long(df.reset_index(),['A','B'],'index','idx')
      .groupby('index').median().add_prefix('med_').rename_axis(None))
    

    Or groupby with the first string on axis=1

    df.groupby(df.columns.str[0],axis=1).median().add_prefix('med_')
    

          med_A     med_B
    0 -0.075465 -0.317335
    1 -0.355822 -0.517270
    2  0.279270 -1.134389