I'm running a nonlinear least squares using the minpack.lm package.
However, for each group in the data I would like optimize (minimize) fitting parameters like similar to Python's minimize function.
The minimize() function is a wrapper around Minimizer for running an optimization problem. It takes an objective function (the function that calculates the array to be minimized), a Parameters object, and several optional arguments.
The reason why I need this is that I want to optimize fitting function based on the obtained fitting parameters to find global fitting parameters that can fit both of the groups in the data.
Here is my current approach for fitting in groups,
df <- data.frame(y=c(replicate(2,c(rnorm(10,0.18,0.01), rnorm(10,0.17,0.01))),
c(replicate(2,c(rnorm(10,0.27,0.01), rnorm(10,0.26,0.01))))),
DVD=c(replicate(4,c(rnorm(10,60,2),rnorm(10,80,2)))),
gr = rep(seq(1,2),each=40),logic=rep(c(1,0),each=40))
the fitting equation of these groups is
fitt <- function(data) {
fit <- nlsLM(y~pi*label2*(DVD/2+U1)^2,
data=data,start=c(label2=1,U1=4),trace=T,control = nls.lm.control(maxiter=130))
}
library(minpack.lm)
library(plyr) # will help to fit in groups
fit <- dlply(df, c('gr'), .fun = fitt) #,"Die" only grouped by Waferr
> fit
$`1`
Nonlinear regression model
model: y ~ pi * label2 * (DVD/2 + U1)^2
data: data
label2 U1
2.005e-05 1.630e+03
$`2`
label2 U1
2.654 -35.104
I need to know are there any function that optimizes the sum-of-squares to get best fitting for both of the groups. We may say that you already have the best fitting parameters as the residual sum-of-squares but I know that minimizer can do this but I haven't find any similar example we can do this in R.
ps. I made it up the numbers and fitting lines.
Not sure about r, but having least squares with shared parameters is usually simple to implement.
A simple python example looks like:
import matplotlib
matplotlib.use('Qt4Agg')
from matplotlib import pyplot as plt
from random import random
from scipy import optimize
import numpy as np
#just for my normal distributed errord
def boxmuller(x0,sigma):
u1=random()
u2=random()
ll=np.sqrt(-2*np.log(u1))
z0=ll*np.cos(2*np.pi*u2)
z1=ll*np.cos(2*np.pi*u2)
return sigma*z0+x0, sigma*z1+x0
#some non-linear function
def f0(x,a,b,c,s=0.05):
return a*np.sqrt(x**2+b**2)-np.log(c**2+x)+boxmuller(0,s)[0]
# residual function for least squares takes two data sets.
# not necessarily same length
# two of three parameters are common
def residuals(parameters,l1,l2,dataPoints):
a,b,c1,c2 = parameters
set1=dataPoints[:l1]
set2=dataPoints[-l2:]
distance1 = [(a*np.sqrt(x**2+b**2)-np.log(c1**2+x))-y for x,y in set1]
distance2 = [(a*np.sqrt(x**2+b**2)-np.log(c2**2+x))-y for x,y in set2]
res = distance1+distance2
return res
xList0=np.linspace(0,8,50)
#some xy data
xList1=np.linspace(0,7,25)
data1=np.array([f0(x,1.2,2.3,.33) for x in xList1])
#more xy data using different third parameter
xList2=np.linspace(0.1,7.5,28)
data2=np.array([f0(x,1.2,2.3,.77) for x in xList2])
alldata=np.array(zip(xList1,data1)+zip(xList2,data2))
# rough estimates
estimate = [1, 1, 1, .1]
#fitting; providing second length is actually redundant
bestFitValues, ier= optimize.leastsq(residuals, estimate,args=(len(data1),len(data2),alldata))
print bestFitValues
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xList1, data1)
ax.scatter(xList2, data2)
ax.plot(xList0,[f0(x,bestFitValues[0],bestFitValues[1],bestFitValues[2] ,s=0) for x in xList0])
ax.plot(xList0,[f0(x,bestFitValues[0],bestFitValues[1],bestFitValues[3] ,s=0) for x in xList0])
plt.show()
#output
>> [ 1.19841984 2.31591587 0.34936418 0.7998094 ]
If required you can even make your minimization yourself. If your parameter space is sort of well behaved, i.e. approximately parabolic minimum, a simple Nelder Mead method is quite OK.