Search code examples
pythonnumpypandasmedian

Median of a list with NaN values removed, in python


Is it possible to calculate the median of a list without explicitly removing the NaN's, but rather, ignoring them?

I want median([1,2,3,NaN,NaN,NaN,NaN,NaN,NaN]) to be 2, not NaN.


Solution

  • numpy 1.9.0 has the function nanmedian:

    nanmedian(a, axis=None, out=None, overwrite_input=False, keepdims=False)
        Compute the median along the specified axis, while ignoring NaNs.
    
        Returns the median of the array elements.
    
        .. versionadded:: 1.9.0
    

    E.g.

    >>> from numpy import nanmedian, NaN
    >>> nanmedian([1,2,3,NaN,NaN,NaN,NaN,NaN,NaN])
    2.0
    

    If you can't use version 1.9.0 of numpy, something like @Parker's answer will work; e.g.

    >>> import numpy as np
    >>> x = np.array([1,2,3,NaN,NaN,NaN,NaN,NaN,NaN])
    >>> np.median(x[~np.isnan(x)])
    2.0
    

    or

    >>> np.median(x[np.isfinite(x)])
    2.0
    

    (When applied to a boolean array, ~ is the unary operator notation for not.)