Search code examples
pythoncurve-fittingscipy-optimizepiecewise

How to fit a piecewise step function to be below my data points?


I have a custom function that takes 6 parameters. It creates a step "curve".
I want to optimize/fit the last 5 parameters (m1, m2, FL1, FL2 and FL3).

def steps(x, FL1, m1, FL2, m2, FL3):
    if x > m1:
        return FL1
    if x > m2:
        return FL2
    else:
        return FL3

I want to fit this function to always be below my data points.

Example:

Simplified Example

The real data looks slightly more complex, like this, but still not complex:

Actual data

What functionality can I use here to fit this function?

I assume scipy.optimize.curve_fit does not work due to least_squares is used. That way it will be centered around the data points.

from scipy.optimize import curve_fit  

def steps(m, FL1, m1, FL2, m2, FL3):
    if m > m1:
        return FL1
    if mass > m2:
        return FL2
    else:
        return FL3

x_data = [8000, 7000, 6000, 5000]  
y_data = [300, 350, 400, 450]  

popt, _ = curve_fit(steps, x_data, y_data)  

This fails:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

It does not seem to like the if statements in the function.


Solution

  • Your optimization only has 2 parameters to optimize; m1, m2. The height of the step follows from your interpolated data. For example, FL1 == y_data[0] always.

    You'll need to express this in your function, and you could use it in curve_fit. It also expects you to vectorize your function, but i just couldn't be bothered.

    If we express this function

    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    import numpy as np
    
    x_data = np.linspace(8000, 5000, 10)
    y_data = np.array([400, 420, 440, 450, 453, 454, 452, 453, 453, 452])
    
    
    def limit(m):
        return np.interp(m, np.flip(x_data), np.flip(y_data))
    
    
    def constrained_steps(m, m1, m2):
        results = list()  # Could be vectorized if you cared enough.
        for i in range(len(m)):
            if m[i] > m1:
                results.append(y_data[0])
            elif m[i] > m2:
                results.append(limit(m1))
            else:
                results.append(limit(m2))
        return np.array(results)
    
    
    # Initialize initial guess at 2/3 and 1/3's through the x_data.
    bounds = x_data[[-1,0]]
    p0 = ((bounds[1]-bounds[0])*2/3 + bounds[0], (bounds[1]-bounds[0])*1/3 + bounds[0])
    popt, _ = curve_fit(constrained_steps, x_data, y_data, p0=p0, bounds=bounds)
    opt_m1, opt_m2 = popt
    
    ms = np.linspace(bounds[0], bounds[1], 1000)
    plt.scatter(x_data, y_data)
    plt.plot(ms, constrained_steps(ms, opt_m1, opt_m2))
    plt.gca().invert_xaxis()
    plt.show()