Search code examples
pythonnumpyscipymedian

Computing moving median with scipy generic_filter and numpy median_filter gives different outputs


I am looking to implement a fast moving median as I have to do a lot of medians for my program. I would like to use python builtins functions as they would be more optimized than what I could do.

My median should do :

  • extract 5 values,
  • remove the center one,
  • find the median of the remaining 4 values.

Basically multiple calls to :

numpy.median(np.array([0, 1, 2, 3, 4])[np.array([True, True, False, True, True])])
# (1. + 3.) / 2. = 2.0

I have found two functions : scipy generic_filter and scipy median_filter. My problem is that generic_filter gives the right output, and not median_filter, even though they seem to have the same parameters. Moreover, generic_filter is slower than median_filter. So I would like to know what I am doing wrong in my call to median_filter and use this one for speed purpose.

import numpy as np
import scipy.ndimage as sc

v = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

print(sc.generic_filter(v, sc.median, footprint=np.array([1, 1, 0, 1, 1]), mode = "mirror", output=np.float64))
%timeit sc.generic_filter(v, sc.median, footprint=np.array([1, 1, 0, 1, 1]), mode = "mirror", output=np.float64)

print(sc.median_filter(v, footprint=np.array([1, 1, 0, 1, 1]), output=np.float64, mode="mirror"))
%timeit sc.median_filter(v, footprint=np.array([1, 1, 0, 1, 1]), output=np.float64, mode="mirror")

As you can see, generic_filter gives the right output : [1.5 1.5 2. 3. 4. 5. 6. 7. 8. 8.5 8.5] 327 µs ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

and median_filter is faster but I don't understand its output : [2. 2. 3. 4. 5. 6. 7. 8. 9. 9. 9.] 12.4 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Do you know what is wrong with my call ?


Solution

  • the only difference seems to be due to how "ties" are handled:

    • sc.median returns the mean of ties
    • sc.median_filter seems to systematically return the larger value

    given the way median_filter is implemented it's awkward to handle the special/specific for the case of "medians over an even number of elements should return the mean of ties" efficiently

    I've hacked together a version that handles this case:

    from scipy.ndimage.filters import _rank_filter
    
    def median_filter(input, footprint, output=None, mode="reflect", cval=0.0, origin=0):
        filter_size = np.where(footprint, 1, 0).sum()
        rank = filter_size // 2
        result = _rank_filter(
            input, rank, None, footprint, output, mode, cval, origin, 'dummy')
        if filter_size % 2 == 0:
            if result is output:
                tmp = result.copy()
            else:
                tmp = result
            rank -= 1
            assert rank > 0
            result = _rank_filter(
                input, rank, None, footprint, output, mode, cval, origin, 'dummy')
            # fix up ties without creating any more garbage
            result += tmp
            result /= 2
        return result
    

    but it's kind of clunky, and uses internal functionality from scipy (I'm using 1.3.0) so is likely to break in the future

    on my machine these benchmark as:

    • sc.generic_filter takes 578 µs ± 8.51 µs per loop
    • sc.median_filter takes 27.4 µs ± 1.37 µs per loop
    • my median_filter takes 65.6 µs ± 1.29 µs per loop