Search code examples
rcurve-fittingnlsnonlinear-optimization

Optimized fitting coefficients for better fitting


I'm running a nonlinear least squares using the minpack.lm package.

However, for each group in the data I would like optimize (minimize) fitting parameters like similar to Python's minimize function.

The minimize() function is a wrapper around Minimizer for running an optimization problem. It takes an objective function (the function that calculates the array to be minimized), a Parameters object, and several optional arguments.

The reason why I need this is that I want to optimize fitting function based on the obtained fitting parameters to find global fitting parameters that can fit both of the groups in the data.

Here is my current approach for fitting in groups,

df <- data.frame(y=c(replicate(2,c(rnorm(10,0.18,0.01), rnorm(10,0.17,0.01))), 
                                c(replicate(2,c(rnorm(10,0.27,0.01), rnorm(10,0.26,0.01))))),
                         DVD=c(replicate(4,c(rnorm(10,60,2),rnorm(10,80,2)))),
                         gr = rep(seq(1,2),each=40),logic=rep(c(1,0),each=40))

the fitting equation of these groups is

fitt <- function(data) {
  fit <- nlsLM(y~pi*label2*(DVD/2+U1)^2,
               data=data,start=c(label2=1,U1=4),trace=T,control = nls.lm.control(maxiter=130))
}

library(minpack.lm)
library(plyr)  # will help to fit in groups

fit <- dlply(df, c('gr'), .fun = fitt)  #,"Die" only grouped by Waferr

> fit
$`1`
Nonlinear regression model
  model: y ~ pi * label2 * (DVD/2 + U1)^2
   data: data
   label2        U1 
2.005e-05 1.630e+03 
$`2`
label2      U1 
 2.654 -35.104   

I need to know are there any function that optimizes the sum-of-squares to get best fitting for both of the groups. We may say that you already have the best fitting parameters as the residual sum-of-squares but I know that minimizer can do this but I haven't find any similar example we can do this in R.

enter image description here

ps. I made it up the numbers and fitting lines.


Solution

  • Not sure about r, but having least squares with shared parameters is usually simple to implement.

    A simple python example looks like:

    import matplotlib
    matplotlib.use('Qt4Agg')
    from matplotlib import pyplot as plt
    
    from random import random
    from scipy import optimize
    import numpy as np
    
    #just for my normal distributed errord
    def boxmuller(x0,sigma):
        u1=random()
        u2=random()
        ll=np.sqrt(-2*np.log(u1))
        z0=ll*np.cos(2*np.pi*u2)
        z1=ll*np.cos(2*np.pi*u2)
        return sigma*z0+x0, sigma*z1+x0
    
    #some non-linear function
    def f0(x,a,b,c,s=0.05):
        return a*np.sqrt(x**2+b**2)-np.log(c**2+x)+boxmuller(0,s)[0]
    
    # residual function for least squares takes two data sets. 
    # not necessarily same length
    # two of three parameters are common
    def residuals(parameters,l1,l2,dataPoints):
        a,b,c1,c2 = parameters
        set1=dataPoints[:l1]
        set2=dataPoints[-l2:]
        distance1 = [(a*np.sqrt(x**2+b**2)-np.log(c1**2+x))-y for x,y in set1]
        distance2 = [(a*np.sqrt(x**2+b**2)-np.log(c2**2+x))-y for x,y in set2]
        res = distance1+distance2
        return res
    
    xList0=np.linspace(0,8,50)
    #some xy data
    xList1=np.linspace(0,7,25)
    data1=np.array([f0(x,1.2,2.3,.33) for x in xList1])
    #more xy data using different third parameter
    xList2=np.linspace(0.1,7.5,28)
    data2=np.array([f0(x,1.2,2.3,.77) for x in xList2])
    alldata=np.array(zip(xList1,data1)+zip(xList2,data2))
    
    # rough estimates
    estimate = [1, 1, 1, .1]
    #fitting; providing second length is actually redundant
    bestFitValues, ier= optimize.leastsq(residuals, estimate,args=(len(data1),len(data2),alldata))
    print bestFitValues
    
    
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xList1, data1)
    ax.scatter(xList2, data2)
    ax.plot(xList0,[f0(x,bestFitValues[0],bestFitValues[1],bestFitValues[2] ,s=0) for x in xList0])
    ax.plot(xList0,[f0(x,bestFitValues[0],bestFitValues[1],bestFitValues[3] ,s=0) for x in xList0])
    
    
    plt.show()
    
    #output
    >> [ 1.19841984  2.31591587  0.34936418  0.7998094 ]
    

    fit with on nonlinear parameter in common

    If required you can even make your minimization yourself. If your parameter space is sort of well behaved, i.e. approximately parabolic minimum, a simple Nelder Mead method is quite OK.