Search code examples
pythonscipycurve-fittingspline

Use UnivariateSpline to fit data tightly


I have a bunch of x, y points that represent a sigmoidal function:

x=[ 1.00094909  1.08787635  1.17481363  1.2617564   1.34867881  1.43562284
  1.52259341  1.609522    1.69631283  1.78276102  1.86426648  1.92896789
  1.9464453   1.94941586  2.00062852  2.073691    2.14982808  2.22808316
  2.30634034  2.38456905  2.46280126  2.54106611  2.6193345   2.69748825]
y=[-0.10057627 -0.10172142 -0.10320428 -0.10378959 -0.10348456 -0.10312503
 -0.10276956 -0.10170055 -0.09778279 -0.08608644 -0.05797392  0.00063599
  0.08732999  0.16429878  0.2223306   0.25368884  0.26830932  0.27313931
  0.27308756  0.27048902  0.26626313  0.26139534  0.25634544  0.2509893 ]

Data

I use scipy.interpolate.UnivariateSpline() to fit to some cubic spline as follows:

from scipy.interpolate import UnivariateSpline
s = UnivariateSpline(x, y, k=3, s=0)

xfit = np.linspace(x.min(), x.max(), 200)
plt.scatter(x,y)
plt.plot(xfit, s(xfit))
plt.show()

This is what I get: Fit

Since I specify s=0, the spline adheres completely to the data, but there are too many wiggles. Using a higher k value leads to even more wiggles.

So my questions are --

  1. How should I correctly use scipy.interpolate.UnivariateSpline() to fit my data? More precisely, how do I make the spline minimise its wiggling?
  2. Is this even the correct choice for this kind of a sigmoidal function? Should I be using something like scipy.optimize.curve_fit() with a trial tanh(x) function instead?

Solution

  • This illustrates the result of fitting two halves of the data to different functions, the lower half to all data with X < 2.0 and the upper half to all data with X >= 1.9, so that there is overlap in the data for the fitted curves. The code switches from one equation to another at the center of the overlap region, X = 1.95.

    combined_model.png

    import numpy, matplotlib
    import matplotlib.pyplot as plt
    
    xData=numpy.array([ 1.00094909,  1.08787635,  1.17481363,  1.2617564,   1.34867881,  1.43562284,
      1.52259341,  1.609522,    1.69631283,  1.78276102,  1.86426648,  1.92896789,
      1.9464453,   1.94941586,  2.00062852,  2.073691,    2.14982808,  2.22808316,
      2.30634034,  2.38456905,  2.46280126,  2.54106611,  2.6193345,   2.69748825])
    yData=numpy.array([-0.10057627, -0.10172142, -0.10320428, -0.10378959, -0.10348456, -0.10312503,
     -0.10276956, -0.10170055, -0.09778279, -0.08608644, -0.05797392,  0.00063599,
      0.08732999,  0.16429878,  0.2223306,   0.25368884,  0.26830932,  0.27313931,
      0.27308756,  0.27048902,  0.26626313,  0.26139534,  0.25634544,  0.2509893 ])
    
    
    # function for x < 1.95 (fitted up to 2.0 for overlap)
    def lowerFunc(x_in): # Bleasdale-Nelder Power With Offset
        # coefficients
        a = -1.1431476643503597E+03
        b = 3.3819340844164983E+21
        c = -6.3633178925040745E+01
        d = 3.1481973843740194E+00
        Offset = -1.0300724909782859E-01
    
        temp = numpy.power(a + b * numpy.power(x_in, c), -1.0 / d)
        temp += Offset
        return temp
    
    # function for x >= 1.95 (fitted down to 1.9 for overlap)
    def upperFunc(x_in): # rational equation with Offset
        # coefficients
        a = -2.5294212380048242E-01
        b = 1.4262697377369586E+00
        c = -2.6141935706529118E-01
        d = -8.8730045918252121E-02
        Offset = -4.8283287597672708E-01
    
        temp = (a * numpy.power(x_in, 2) + b * numpy.log(x_in)) # numerator
        temp /= (1.0 + c * numpy.power(numpy.log(x_in), -1) + d * numpy.exp(x_in)) # denominator
        temp += Offset
        return temp
    
    
    def combinedFunc(x_in):
        returnVal = []
        for x in x_in:
            if x < 1.95:
                returnVal.append(lowerFunc(x))
            else:
                returnVal.append(upperFunc(x))
        return returnVal
    
    
    modelPredictions = combinedFunc(xData) 
    
    absError = modelPredictions - yData
    
    SE = numpy.square(absError) # squared errors
    MSE = numpy.mean(SE) # mean squared errors
    RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
    Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)
    
    
    ##########################################################
    # graphics output section
    def ModelAndScatterPlot(graphWidth, graphHeight):
        f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
        axes = f.add_subplot(111)
    
        # first the raw data as a scatter plot
        axes.plot(xData, yData,  'D')
    
        # create data for the fitted equation plot
        xModel = numpy.linspace(min(xData), max(xData))
        yModel = combinedFunc(xModel)
    
        # now the model as a line plot
        axes.plot(xModel, yModel)
    
        axes.set_xlabel('X Data') # X axis data label
        axes.set_ylabel('Y Data') # Y axis data label
    
        plt.show()
        plt.close('all') # clean up after using pyplot
    
    graphWidth = 800
    graphHeight = 600
    ModelAndScatterPlot(graphWidth, graphHeight)