Here is my dataframe df
1.1 1.2 1.3 2.1 ... 5.1 6.1 6.2. 6.3.
sample_a 1 1 2 4 2 3 4 2
sample_b 2 3 3 1 1 3 1 2
sample_c 2 4 3 1 1 3 2 2
I want to group df
by extracting the first number of the column name (i.e. take 1 from 1.1, take 2 from 2.1, take 6 from 6.1) and aggregate the df
by the median.
This is my desired output:
1 2 ... 5 6
sample_a 1 4 2 3
sample_b 3 1 1 2
sample_c 3 1 1 2
So for example, for the first element (sample_a, 1) the median of 1.1, 1.2, and 1.3 is 1.
This is the code I currently have.
df.columns = df.columns.str.extract('([\d])\.\d+',expand=False)
df.groupby(df.columns, axis=1).median(axis=1)
I'm not sure if axis should be 0 or 1, but either way I am getting KeyError: 'axis'
When I try the following code, it works fine.
df.columns = df.columns.str.extract('([\d])\.\d+',expand=False)
df.groupby(df.columns,axis=1).sum()
Why is median not working?
Use groupby
on axis=1
df.groupby(df.columns.str[0], axis=1).median()
1 2 5 6
sample_a 1 4 2 3
sample_b 3 1 1 2
sample_c 3 1 1 2