Search code examples
pythonpandaspandas-groupbymedian

How do you calculate median on a DataFrameGroupBy object?


Here is my dataframe df

            1.1  1.2  1.3  2.1 ... 5.1  6.1 6.2. 6.3.
sample_a    1    1     2    4       2    3   4   2
sample_b    2    3     3    1       1    3   1   2
sample_c    2    4     3    1       1    3   2   2

I want to group df by extracting the first number of the column name (i.e. take 1 from 1.1, take 2 from 2.1, take 6 from 6.1) and aggregate the df by the median.

This is my desired output:

            1    2    ...   5    6
sample_a    1    4          2    3 
sample_b    3    1          1    2 
sample_c    3    1          1    2 

So for example, for the first element (sample_a, 1) the median of 1.1, 1.2, and 1.3 is 1.

This is the code I currently have.

df.columns = df.columns.str.extract('([\d])\.\d+',expand=False)
df.groupby(df.columns, axis=1).median(axis=1)

I'm not sure if axis should be 0 or 1, but either way I am getting KeyError: 'axis'

When I try the following code, it works fine.

df.columns = df.columns.str.extract('([\d])\.\d+',expand=False)
df.groupby(df.columns,axis=1).sum()

Why is median not working?


Solution

  • Use groupby on axis=1

    df.groupby(df.columns.str[0], axis=1).median()
    

              1  2  5  6
    sample_a  1  4  2  3
    sample_b  3  1  1  2
    sample_c  3  1  1  2