Search code examples
optimizationlinear-regressioncurve-fittingleast-squares

Piecewise linear fit


I am trying to automate the fitting of power-spectral densities. A typical plot look like this (loglog scale) : enter image description here

in red is one of my attempt at fitting it. I would like to have a way to fit up to 3 linear segments to the power-sepctral density plots. I have already tried several approaches, for instance by looking at 1 using curve_fit :

def logfit(x, a1, a2, b, cutoff):
    cutoff = int(params[3])
    out = np.empty_like(x)
    out[:cutoff] = x[:cutoff]*a1 + b
    out[cutoff:] = x[cutoff]*a1 + b + (x[cutoff:] - x[cutoff])*a2 
    return out   
#the fit is only linear on loglog scale, need to apply np.log10 to x and y.
popt, pcov = curve_fit(logfit, np.log10(x), np.log10(y), bounds = ([-2,-2,-np.inf,99],[0,0,np.inf,100]))

Which usually delivers a flat fit. I also looked at [2], making use of the lmfit package :

def logfit(params, x, data):
    a1, a2, b = params['a1'],  params['a2'], params['b']
    c = int(params['cutoff'])
    out = np.empty_like(x)
    out[:c] = x[:c]*a1 + b
    out[c:] = x[c]*a1 + b + (x[c:] - x[c])*a2 
    return data - out    

    params = Parameters()
    params.add('a1', value = -0.3, min = -2, max = 0)
    params.add('a2', value = -2, min = -2, max = -0.5)
    params.add('b', min = 0, max = 5)
    params.add('cutoff', min = 150, max = 500)
    params = minimize(logfit, params, args=(f, pxx))

But this also produces fits that are completely off (like the red fit in the plot above). Is this because there are too many parameters ? I would have thought this problem would be simple enough to solve by optimisation ...

1 How to apply piecewise linear fit for a line with both positive and negative slopes in Python?

[2] Fit a curve for data made up of two distinct regimes


Solution

  • Thanks to @M Newville, and using answers from [1] I came up with the following working solution :

    def continuousFit(x, y, weight = True):
        """
            fits data using two lines, forcing continuity between the fits.
            if `weights` = true, applies a weights to datapoints, to cope with unevenly distributed data.
        """
        lmodel = Model(continuousLines)
        params = Parameters()
        #Typical values and boundaries for power-spectral densities :
        params.add('a1', value = -1, min = -2, max = 0)
        params.add('a2', value = -1, min = -2, max = -0.5)
        params.add('b', min = 1, max = 10)
        params.add('cutoff', min = x[10], max = x[-10])
        if weight:
            #weights are based on the density of datapoints on the x-axis
            #since there are fewer points on the low-frequencies they get heavier weights.
            weights = np.abs(x[1:] - x[:-1])
            weights = weights/np.sum(weights)
            weights = np.append(weights, weights[-1])
            result = lmodel.fit(y, params, weights = weights, x=x)  
            return result.best_fit
        else:
            result = lmodel.fit(y, params, x=x)  
            return result.best_fit
    
    def continuousLines(x, a1, a2, b, cutoff):
        """two line segments, joined at a cutoff point in a continuous manner"""
        return a1*x + b  + a2*np.maximum(0, x - cutoff)
    

    [1] : Fit a curve for data made up of two distinct regimes