python multithreading matlab numpy median

Multithreading 1D Median Filtering in Python

I have tried the following python median filtering on time-series signals to find the fastest and more efficient function.

sig is a numpy array of size 80×188 which contains 188 samples measured by 80 sensors.

import numpy as np
from scipy.ndimage import median_filter
from scipy.signal import medfilt
from scipy.signal import medfilt2d
import time

sig = np.random.rand(80,188).astype('f')
print(type(sig))
print(type(sig[0][0]))


window_length = 181

t = time.time()
sigFiltered = medfilt2d(sig, (1,window_length))
elapsed = time.time() - t
print('scipy.signal.medfilt2d: %g seconds' % elapsed)

t = time.time()
sigFiltered = median_filter(sig, (1,window_length))
elapsed = time.time() - t
print('scipy.ndimage.median_filter: %g seconds' % elapsed)

t = time.time()
sigFiltered = medfilt(sig, (1,window_length))
elapsed = time.time() - t
print('scipy.signal.medfilt: %g seconds' % elapsed)

The code can be tried here.

The result of the filter is another time-series array of size 80×188 with smoothed time-points for each sensor.

MATLAB medfilt1(sig, 181, [], 2) performs the filtering on the same data 10 times faster compared to scipy.signal.medfilt2d, which was the fastest among other functions. On my machine, MATLAB=2ms vs Python=20 ms. I think MATLAB performs multithreading processing and python does not.

Is there any way to perform multithreading median filtering to speed up the process and assign sensors to different threads? Is there a more efficient median filtering available in python? Can I achieve the performance of MATLAB win python or at least get closer to it?

Solution

With such a long filter relative to the input most outputs using a standard medfilt are going to be the same. Where this to be a convolution this would be a "full" convolution. If you instead only give outputs for "valid" convolution, that will be much faster in this case:

t = time.time()
medians = []
for i in range(188-181):
    sig2 = sig[:, i:i+window_length]
    f = np.median(sig2, axis=1)
    medians.append(f)

sigFiltered = np.stack(medians).T
elapsed = time.time() - t
print('numpy.median: %g seconds' % elapsed)

numpy.median: 0.0015518 seconds

This is in the ballpark of the requested 1 ms runtime per 188 sample size.

Considering that even each unique value here will change very slowly/rarely with new input samples. You could therefore speed this up considerably by using a hop larger than 1.