Search code examples
pythonmediansmoothing

What is the difference between medfilt from scipy.signal and rolling().median from pandas?


I want to use a median filter for smoothing a signal, I see there are two methods in Python which can be used:

  • medfilt from scipy.signal
  • DataFrame.rolling().median() from pandas

By selecting the same window size for these two methods I get different results. I have attached an example data set. Furthermore in the second method the number of data points are changing when the filter is applied (according to the window size) which I expect that to happen, however in the second method the number of smoothed data are the same as the original data.

What is the difference between these two methods and why are different results obtained?

import pandas as pd
import scipy.signal as ss

signal = [4, 3.8, 3.75, 3.9, 3.53, 3.26, 2.33, 2.8, 2.5, 2.4, 2, 2.2, 1.5, 1.7]

# First method
SmoothedSignal = ss.medfilt(signal, kernel_size=5)
print(SmoothedSignal)
print(len(SmoothedSignal))

# Second method
signal = pd.DataFrame(signal)
RollingMedian = signal.rolling(5).median()
print(RollingMedian)
print(len(RollingMedian))

Solution

  • The cause of the differing median values is the alignment of the kernel. pandas.DataFrame.rolling right aligns the kernel by default, while scipy.signal.medfit center aligns its kernal by default.

    You can center align the DataFrame.rolling by setting the center keyword argument to True.

    import pandas as pd
    import scipy.signal as ss
    
    signal = [4, 3.8, 3.75, 3.9, 3.53, 3.26, 2.33, 2.8, 2.5, 2.4, 2, 2.2, 1.5, 1.7]
    
    # scipy
    size = len(signal)
    smoothed = ss.medfilt(signal, kernel_size=5)
    
    # rolling - right aligned
    signal = pd.DataFrame(signal)
    rolling_right = signal.rolling(5).median()
    
    # rolling - center aligned
    signal = pd.DataFrame(signal)
    rolling_center = signal.rolling(5 ,center = True).median()
    
    
    df = pd.DataFrame()
    df[ 'smooth' ] = smoothed 
    df[ 'rolling_center' ] = rolling_center
    df[ 'rolling_right' ] = rolling_right
    
    df
    
    # output
        smooth  rolling_center  rolling_right
    0   3.75    NaN             NaN
    1   3.80    NaN             NaN
    2   3.80    3.80            NaN
    3   3.75    3.75            NaN
    4   3.53    3.53            3.80
    5   3.26    3.26            3.75
    6   2.80    2.80            3.53
    7   2.50    2.50            3.26
    8   2.40    2.40            2.80
    9   2.40    2.40            2.50
    10  2.20    2.20            2.40
    11  2.00    2.00            2.40
    12  1.70    NaN             2.20
    13  1.50    NaN             2.00
    

    You'll also notice the differences in the nan filling from using rolling.