Search code examples
pythonnumpyscipyinterpolationsmoothing

Smooth a curve in python with no error at the boundaries?


Consider the following curve associated with two numpy arrays x and y:

curve

How to smooth it correctly in python with no problem near xmax ? (if I apply a gaussian filter, the curve goes up at the end)

The data are here (two columns) : http://lite4.framapad.org/p/xqhpGJpV5R


Solution

  • The easiest way to do this is to detrend the signal before filtering. The edge effects you're seeing are mostly due to the fact that the signal isn't stationary (i.e. there's a slope to it).

    First-off, let's demonstrate the problem:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.ndimage import gaussian_filter1d
    
    x, y = np.loadtxt('spectrum.dat').T
    
    # Smooth with a guassian filter
    smooth = gaussian_filter1d(y, 10)
    
    fig, ax = plt.subplots()
    ax.loglog(x, y, color='black')
    ax.loglog(x, smooth, color='red')
    plt.show()
    

    enter image description here

    Ouch! The edge effects are particularly bad at the end (right-hand size) of your data because that's where the slope is the steepest. If you had a steeper slope at the start, you'd see a stronger edge effect there, as well.


    The good news is that there are a number of ways to correct for this. @ChristianK.'s answer shows how to use smoothing splines to effectively preform a low-pass filter. I'll demonstrate using a few other signal processing methods to accomplish the same thing. Which is "best" all depends on your needs. Smoothing splines is straight-forward. Using "fancier" signal processing methods gives you control over exactly what frequency is filtered out.

    Your data looks like a parabola in log-log space, so let's de-trend it with a second order polynomial in log-log space, and then apply the filter.

    As a quick example:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.ndimage import gaussian_filter1d
    
    x, y = np.loadtxt('spectrum.dat').T
    
    # Let's detrend by fitting a second-order polynomial in log space
    # (Note that your data looks like a parabola in log-log space.)
    logx, logy = np.log(x), np.log(y)
    model = np.polyfit(logx, logy, 2)
    trend = np.polyval(model, logx)
    
    # Smooth with a guassian filter
    smooth = gaussian_filter1d(logy - trend, 10)
    
    # Add the trend back in and convert back to linear space
    smooth = np.exp(smooth + trend)
    
    fig, ax = plt.subplots()
    ax.loglog(x, y, color='black')
    ax.loglog(x, smooth, color='red')
    plt.show()
    

    enter image description here

    Note that we still have some edge effects. This is because the gaussian filter I used causes a phase shift. If we really wanted to get fancy, we could detrend things and then use a zero-phase filter to further minimize the edge effects.

    import numpy as np
    import matplotlib.pyplot as plt
    import scipy.signal as signal
    
    def main():
        x, y = np.loadtxt('spectrum.dat').T
    
        logx, logy = np.log(x), np.log(y)
        smooth_log = detrend_zero_phase(logx, logy)
        smooth = np.exp(smooth_log)
    
        fig, ax = plt.subplots()
        ax.loglog(x, y, 'k-')
        ax.loglog(x, smooth, 'r-')
        plt.show()
    
    def zero_phase(y):
        # Low-pass filter...
        b, a = signal.butter(3, 0.05)
    
        # Filtfilt applies the filter twice to avoid phase shifts.
        return signal.filtfilt(b, a, y)
    
    def detrend_zero_phase(x, y):
        # Fit a second order polynomial (Can't just use scipy.signal.detrend here,
        # because we need to know what the trend is to add it back in.)
        model = np.polyfit(x, y, 2)
        trend = np.polyval(model, x)
    
        # Apply a zero-phase filter to the detrended curve.
        smooth = zero_phase(y - trend)
    
        # Add the trend back in
        return smooth + trend
    
    main()
    

    enter image description here