Search code examples
pythonscipycurve-fittingscipy-optimize

Python: How do I fit a line to a specific interval of data?


I am trying to fit a line to the 9.0 to 10.0 um regime of my data set. Here is my plot: enter image description here

Unfortunately, it's a scatter plot with the x values not being indexed from small numbers to large numbers so I can't just apply the optimize.curve_fit function to a specific range of indices to get the desired range in x values.

Below is my go-to procedure for curve fitting. How would I modify it to only get a fit for the 9.0 to 10.0 um x-value range (in my case, the x_dist variable) which has points scattered randomly throughout the indices?

    def func(x,a,b):                                # Define your fitting function
    return a*x+b                                  
  
initialguess = [-14.0, 0.05]                     # initial guess for the parameters of the function func

fit, covariance = optimize.curve_fit(             # call to the fitting routine curve_fit.  Returns optimal values of the fit parameters, and their estimated variance
        func,                                     # function to fit
        x_dist,                                    # data for independant variable
        xdiff_norm,                                    # data for dependant variable
        initialguess,                             # initial guess of fit parameters
        )                                     # uncertainty in dependant variable

print("linear coefficient:",fit[0],"+-",np.sqrt(covariance[0][0])) #print value and one std deviation of first fit parameter
print("offset coefficient:",fit[1],"+-",np.sqrt(covariance[1][1]))     #print value and one std deviation of second fit parameter

print(covariance)

Solution

  • You correctly identified that the problem arises because your x-value data are not ordered. You can address this problem differently. One way is to use Boolean masks to filter out the unwanted values. I tried to be as close as possible to your example:

    from matplotlib import pyplot as plt
    import numpy as np
    from scipy import optimize
    
    #fake data generation
    np.random.seed(1234)
    arr = np.linspace(0, 15, 100).reshape(2, 50)
    arr[1, :] = np.random.random(50)
    arr[1, 20:45] += 2 * arr[0, 20:45] -5
    rng = np.random.default_rng()
    rng.shuffle(arr, axis = 1)
    x_dist = arr[0, :]
    xdiff_norm = arr[1, :]
    
    def func(x, a, b):                              
        return a * x + b      
    
    initialguess = [5, 3]
    mask = (x_dist>2.5) & (x_dist<6.6)
    fit, covariance = optimize.curve_fit(           
            func,                                     
            x_dist[mask],   
            xdiff_norm[mask],    
            initialguess)   
    
    plt.scatter(x_dist, xdiff_norm, label="data")
    x_fit = np.linspace(x_dist[mask].min(), x_dist[mask].max(), 100)
    y_fit = func(x_fit, *fit)
    plt.plot(x_fit, y_fit, c="red", label="fit")
    plt.legend()
    plt.show()
    

    Sample output: ![enter image description here

    This approach does not modify x_dist and xdiff_norm which might or might not be a good thing for further data evaluation. If you wanted to use a line plot instead of a scatter plot, it might be rather useful to sort your arrays in advance (try a line plot with the above method to see why):

    from matplotlib import pyplot as plt
    import numpy as np
    from scipy import optimize
    
    #fake data generation
    np.random.seed(1234)
    arr = np.linspace(0, 15, 100).reshape(2, 50)
    arr[1, :] = np.random.random(50)
    arr[1, 20:45] += 2 * arr[0, 20:45] -5
    rng = np.random.default_rng()
    rng.shuffle(arr, axis = 1)
    x_dist = arr[0, :]
    xdiff_norm = arr[1, :]
    
    def func(x, a, b):                              
        return a * x + b      
    
    #find indexes of a sorted x_dist array, then sort both arrays based on this index
    ind = x_dist.argsort()
    x_dist = x_dist[ind]
    xdiff_norm = xdiff_norm[ind]
    
    #identify index where linear range starts for normal array indexing
    start = np.argmax(x_dist>2.5)
    stop = np.argmax(x_dist>6.6)
    
    initialguess = [5, 3]
    fit, covariance = optimize.curve_fit(           
            func,                                     
            x_dist[start:stop],   
            xdiff_norm[start:stop],    
            initialguess)   
    
    plt.plot(x_dist, xdiff_norm, label="data")
    x_fit = np.linspace(x_dist[start], x_dist[stop], 100)
    y_fit = func(x_fit, *fit)
    plt.plot(x_fit, y_fit, c="red", ls="--", label="fit")
    plt.legend()
    plt.show()
    

    Sample output (unsurprisingly not much different): ![enter image description here