Search code examples
pythonpandasindexingaxismedian

Pandas DataFrame Median Function


If I have a Pandas DataFrame and want to calculate the median value for each column, it seems that the argument axis=1 should give the median by columns (according to the documentation). But in practice, axis=0 gives the column medians. Here is a simple replicable example:

import pandas as pd

my_data = [[1.1, 2.2, 3.3], [1.2, 2.3, 3.4], [1.3, 2.4, 3.5]]
df = pd.DataFrame(my_data)
print(df.head())

print("\nTry to calculate median with axis=1\n")

print(df.median(axis=1))

It is showing the median by row. Changing it to axis=0 shows the median by column. Does this have to do with the way that the index is set for the DataFrame?


Solution

  • It does what it is supposed to do, axis = 1 means to apply the function each row. You can see from this other example

    >>> print(df.sum(axis = 1))
    0    6.6
    1    6.9
    2    7.2
    dtype: float64
    

    Or equivalently

    >>> print(df.apply(sum, axis = 1))
    0    6.6
    1    6.9
    2    7.2
    dtype: float64
    

    and you can see in the documentation

    axis : {0 or ‘index’, 1 or ‘columns’}, default 0
    
    Axis along which the function is applied:
    
    0 or ‘index’: apply function to each column.
    1 or ‘columns’: apply function to each row.
    

    So if you want to calculate the mean of each row column you should use axis = 0