Search code examples
pythonnumpymedianpercentile

Sort data before using numpy.median


I'm measuring the median and percentiles of a sample of data using Python.

import numpy as np
xmedian=np.median(data)
x25=np.percentile(data, 25)
x75=np.percentile(data, 75)

Do I have to use the np.sort() function on my data before measuring the median?


Solution

  • According to the documentation of numpy.median, you don't have to manually sort the data before feeding it to the function, as it does this internally. It is actually very good practice to view the source-code of the function, and try to understand how it works.

    Example, showing that sorting beforehand is unnecessary:

    In [1]: import numpy as np
    
    In [2]: data = np.array([[ 10, 23,  1,  4,  5],
       ...:                  [  2, 12,  5, 22, 14]])
    
    In [3]: median = np.median(data)  # Median of unsorted data
    
    In [4]: median
    Out[4]: 7.5
    
    In [5]: data.sort()  # Sorting data
    
    In [6]: median_sorted = np.median(data.ravel())  # Median of the flattened array
    
    In [7]: median_sorted
    Out[7]: 7.5
    
    In [8]: median == median_sorted  # Check that they are equal
    Out[8]: True