I have the following pandas DataFrame.
df = pd.DataFrame(np.random.randn(3,6), columns=['A1','A2','A3','B1','B2','B3'])
df
A1 A2 A3 B1 B2 B3
0 -0.409420 2.382457 1.151565 0.625461 0.224453 -0.351573
1 -0.676554 -1.485376 0.597227 0.240113 0.033963 1.224241
2 0.678698 1.392778 1.031625 0.388137 -0.566746 -0.798156
How do I get the median of the columns like this
medA medB
0 ... ...
1 ... ...
2 ... ...
My actual data frame has 300 columns, so I would like to differentiate by similarity in column name.
this looks like pd.wide_to_long
:
(pd.wide_to_long(df.reset_index(),['A','B'],'index','idx')
.groupby('index').median().add_prefix('med_').rename_axis(None))
Or groupby
with the first string on axis=1
df.groupby(df.columns.str[0],axis=1).median().add_prefix('med_')
med_A med_B
0 -0.075465 -0.317335
1 -0.355822 -0.517270
2 0.279270 -1.134389